Message ID | 20220422150507.222488-5-namhyung@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | perf record: Implement off-cpu profiling with BPF (v1) | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Fri, Apr 22, 2022 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Recently sched_switch tracepoint added a new argument for prev_state, > but it's hard to handle the change in a BPF program. Instead, we can > check the function prototype in BTF before loading the program. > > Thus I make two copies of the tracepoint handler and select one based > on the BTF info. > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > --- > tools/perf/util/bpf_off_cpu.c | 32 +++++++++++++++ > tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------ > 2 files changed, 76 insertions(+), 11 deletions(-) > [...] > > +SEC("tp_btf/sched_switch") > +int on_switch3(u64 *ctx) > +{ > + struct task_struct *prev, *next; > + int state; > + > + if (!enabled) > + return 0; > + > + /* > + * TP_PROTO(bool preempt, struct task_struct *prev, > + * struct task_struct *next) > + */ > + prev = (struct task_struct *)ctx[1]; > + next = (struct task_struct *)ctx[2]; you don't have to have two BPF programs for this, you can use read-only variable to make this choice. On BPF side const volatile bool has_prev_state = false; ... if (has_prev_state) { prev = (struct task_struct *)ctx[2]; next = (struct task_struct *)ctx[3]; } else { prev = (struct task_struct *)ctx[1]; next = (struct task_struct *)ctx[2]; } And from user-space side you do your detection and before skeleton is loaded: skel->rodata->has_prev_state = <whatever you detected> But I'm still hoping that this prev_state argument can be moved to the end ([0]) to make all this unnecessary. [0] https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/ > + > + state = get_task_state(prev); > + > + return on_switch(ctx, prev, next, state); > +} > + [...]
Hello, On Tue, Apr 26, 2022 at 4:55 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Apr 22, 2022 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > Recently sched_switch tracepoint added a new argument for prev_state, > > but it's hard to handle the change in a BPF program. Instead, we can > > check the function prototype in BTF before loading the program. > > > > Thus I make two copies of the tracepoint handler and select one based > > on the BTF info. > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > --- > > tools/perf/util/bpf_off_cpu.c | 32 +++++++++++++++ > > tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------ > > 2 files changed, 76 insertions(+), 11 deletions(-) > > > > [...] > > > > > +SEC("tp_btf/sched_switch") > > +int on_switch3(u64 *ctx) > > +{ > > + struct task_struct *prev, *next; > > + int state; > > + > > + if (!enabled) > > + return 0; > > + > > + /* > > + * TP_PROTO(bool preempt, struct task_struct *prev, > > + * struct task_struct *next) > > + */ > > + prev = (struct task_struct *)ctx[1]; > > + next = (struct task_struct *)ctx[2]; > > > you don't have to have two BPF programs for this, you can use > read-only variable to make this choice. > > On BPF side > > const volatile bool has_prev_state = false; > > ... > > if (has_prev_state) { > prev = (struct task_struct *)ctx[2]; > next = (struct task_struct *)ctx[3]; > } else { > prev = (struct task_struct *)ctx[1]; > next = (struct task_struct *)ctx[2]; > } > > > And from user-space side you do your detection and before skeleton is loaded: > > skel->rodata->has_prev_state = <whatever you detected> Nice, thanks for the tip! Actually I tried something similar but it was with a variable (in bss) so the verifier in an old kernel rejected it due to invalid arg access. I guess now the const makes the verifier ignore the branch as if it's dead but the compiler still generates the code, right? > > But I'm still hoping that this prev_state argument can be moved to the > end ([0]) to make all this unnecessary. > > [0] https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/ Yeah, that would make life easier. :) Thanks, Namhyung
On Wed, Apr 27, 2022 at 11:15 AM Namhyung Kim <namhyung@kernel.org> wrote: > > Hello, > > On Tue, Apr 26, 2022 at 4:55 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Fri, Apr 22, 2022 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > Recently sched_switch tracepoint added a new argument for prev_state, > > > but it's hard to handle the change in a BPF program. Instead, we can > > > check the function prototype in BTF before loading the program. > > > > > > Thus I make two copies of the tracepoint handler and select one based > > > on the BTF info. > > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > > --- > > > tools/perf/util/bpf_off_cpu.c | 32 +++++++++++++++ > > > tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------ > > > 2 files changed, 76 insertions(+), 11 deletions(-) > > > > > > > [...] > > > > > > > > +SEC("tp_btf/sched_switch") > > > +int on_switch3(u64 *ctx) > > > +{ > > > + struct task_struct *prev, *next; > > > + int state; > > > + > > > + if (!enabled) > > > + return 0; > > > + > > > + /* > > > + * TP_PROTO(bool preempt, struct task_struct *prev, > > > + * struct task_struct *next) > > > + */ > > > + prev = (struct task_struct *)ctx[1]; > > > + next = (struct task_struct *)ctx[2]; > > > > > > you don't have to have two BPF programs for this, you can use > > read-only variable to make this choice. > > > > On BPF side > > > > const volatile bool has_prev_state = false; > > > > ... > > > > if (has_prev_state) { > > prev = (struct task_struct *)ctx[2]; > > next = (struct task_struct *)ctx[3]; > > } else { > > prev = (struct task_struct *)ctx[1]; > > next = (struct task_struct *)ctx[2]; > > } > > > > > > And from user-space side you do your detection and before skeleton is loaded: > > > > skel->rodata->has_prev_state = <whatever you detected> > > Nice, thanks for the tip! > > Actually I tried something similar but it was with a variable (in bss) > so the verifier in an old kernel rejected it due to invalid arg access. > > I guess now the const makes the verifier ignore the branch as if > it's dead but the compiler still generates the code, right? yes, exactly > > > > > But I'm still hoping that this prev_state argument can be moved to the > > end ([0]) to make all this unnecessary. > > > > [0] https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/ > > Yeah, that would make life easier. :) > > Thanks, > Namhyung
On Wed, Apr 27, 2022 at 12:26 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Wed, Apr 27, 2022 at 11:15 AM Namhyung Kim <namhyung@kernel.org> wrote: > > Actually I tried something similar but it was with a variable (in bss) > > so the verifier in an old kernel rejected it due to invalid arg access. > > > > I guess now the const makes the verifier ignore the branch as if > > it's dead but the compiler still generates the code, right? > > > yes, exactly Then I'm curious how it'd work on newer kernels. The verifier sees the false branch and detects type mismatch for the second argument then it'd reject the program? Thanks, Namhyung
On Thu, Apr 28, 2022 at 4:58 PM Namhyung Kim <namhyung@kernel.org> wrote: > > On Wed, Apr 27, 2022 at 12:26 PM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Wed, Apr 27, 2022 at 11:15 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > Actually I tried something similar but it was with a variable (in bss) > > > so the verifier in an old kernel rejected it due to invalid arg access. > > > > > > I guess now the const makes the verifier ignore the branch as if > > > it's dead but the compiler still generates the code, right? > > > > > > yes, exactly > > Then I'm curious how it'd work on newer kernels. > The verifier sees the false branch and detects type mismatch > for the second argument then it'd reject the program? > Verifier will know which branch is never taken, and will just ignore and remove any code in it as dead code. > Thanks, > Namhyung
diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c index 89f36229041d..38aeb13d3d25 100644 --- a/tools/perf/util/bpf_off_cpu.c +++ b/tools/perf/util/bpf_off_cpu.c @@ -86,6 +86,37 @@ static void off_cpu_finish(void *arg __maybe_unused) off_cpu_bpf__destroy(skel); } +/* recent kernel added prev_state arg, so it needs to call the proper function */ +static void check_sched_switch_args(void) +{ + const struct btf *btf = bpf_object__btf(skel->obj); + const struct btf_type *t1, *t2, *t3; + u32 type_id; + + type_id = btf__find_by_name_kind(btf, "bpf_trace_sched_switch", + BTF_KIND_TYPEDEF); + if ((s32)type_id < 0) + goto old_format; + + t1 = btf__type_by_id(btf, type_id); + if (t1 == NULL) + goto old_format; + + t2 = btf__type_by_id(btf, t1->type); + if (t2 == NULL || !btf_is_ptr(t2)) + goto old_format; + + t3 = btf__type_by_id(btf, t2->type); + if (t3 && btf_is_func_proto(t3) && btf_vlen(t3) == 4) { + /* new format: disable old functions */ + bpf_program__set_autoload(skel->progs.on_switch3, false); + return; + } + +old_format: + bpf_program__set_autoload(skel->progs.on_switch4, false); +} + int off_cpu_prepare(struct evlist *evlist, struct target *target) { int err, fd, i; @@ -114,6 +145,7 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target) } set_max_rlimit(); + check_sched_switch_args(); err = off_cpu_bpf__load(skel); if (err) { diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c index 27425fe361e2..e11e198af86f 100644 --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -121,22 +121,13 @@ static inline int can_record(struct task_struct *t, int state) return 1; } -SEC("tp_btf/sched_switch") -int on_switch(u64 *ctx) +static int on_switch(u64 *ctx, struct task_struct *prev, + struct task_struct *next, int state) { __u64 ts; - int state; __u32 pid, stack_id; - struct task_struct *prev, *next; struct tstamp_data elem, *pelem; - if (!enabled) - return 0; - - prev = (struct task_struct *)ctx[1]; - next = (struct task_struct *)ctx[2]; - state = get_task_state(prev); - ts = bpf_ktime_get_ns(); if (!can_record(prev, state)) @@ -178,4 +169,46 @@ int on_switch(u64 *ctx) return 0; } +SEC("tp_btf/sched_switch") +int on_switch3(u64 *ctx) +{ + struct task_struct *prev, *next; + int state; + + if (!enabled) + return 0; + + /* + * TP_PROTO(bool preempt, struct task_struct *prev, + * struct task_struct *next) + */ + prev = (struct task_struct *)ctx[1]; + next = (struct task_struct *)ctx[2]; + + state = get_task_state(prev); + + return on_switch(ctx, prev, next, state); +} + +SEC("tp_btf/sched_switch") +int on_switch4(u64 *ctx) +{ + struct task_struct *prev, *next; + int prev_state; + + if (!enabled) + return 0; + + /* + * TP_PROTO(bool preempt, int prev_state, + * struct task_struct *prev, + * struct task_struct *next) + */ + prev = (struct task_struct *)ctx[2]; + next = (struct task_struct *)ctx[3]; + prev_state = (int)ctx[1]; + + return on_switch(ctx, prev, next, prev_state); +} + char LICENSE[] SEC("license") = "Dual BSD/GPL";
Recently sched_switch tracepoint added a new argument for prev_state, but it's hard to handle the change in a BPF program. Instead, we can check the function prototype in BTF before loading the program. Thus I make two copies of the tracepoint handler and select one based on the BTF info. Signed-off-by: Namhyung Kim <namhyung@kernel.org> --- tools/perf/util/bpf_off_cpu.c | 32 +++++++++++++++ tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------ 2 files changed, 76 insertions(+), 11 deletions(-)