diff mbox series

[4/4] perf record: Handle argument change in sched_switch

Message ID 20220422150507.222488-5-namhyung@kernel.org (mailing list archive)
State Not Applicable
Headers show
Series perf record: Implement off-cpu profiling with BPF (v1) | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Namhyung Kim April 22, 2022, 3:05 p.m. UTC
Recently sched_switch tracepoint added a new argument for prev_state,
but it's hard to handle the change in a BPF program.  Instead, we can
check the function prototype in BTF before loading the program.

Thus I make two copies of the tracepoint handler and select one based
on the BTF info.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/bpf_off_cpu.c          | 32 +++++++++++++++
 tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------
 2 files changed, 76 insertions(+), 11 deletions(-)

Comments

Andrii Nakryiko April 26, 2022, 11:55 p.m. UTC | #1
On Fri, Apr 22, 2022 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Recently sched_switch tracepoint added a new argument for prev_state,
> but it's hard to handle the change in a BPF program.  Instead, we can
> check the function prototype in BTF before loading the program.
>
> Thus I make two copies of the tracepoint handler and select one based
> on the BTF info.
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/bpf_off_cpu.c          | 32 +++++++++++++++
>  tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------
>  2 files changed, 76 insertions(+), 11 deletions(-)
>

[...]

>
> +SEC("tp_btf/sched_switch")
> +int on_switch3(u64 *ctx)
> +{
> +       struct task_struct *prev, *next;
> +       int state;
> +
> +       if (!enabled)
> +               return 0;
> +
> +       /*
> +        * TP_PROTO(bool preempt, struct task_struct *prev,
> +        *          struct task_struct *next)
> +        */
> +       prev = (struct task_struct *)ctx[1];
> +       next = (struct task_struct *)ctx[2];


you don't have to have two BPF programs for this, you can use
read-only variable to make this choice.

On BPF side

const volatile bool has_prev_state = false;

...

if (has_prev_state) {
    prev = (struct task_struct *)ctx[2];
    next = (struct task_struct *)ctx[3];
} else {
    prev = (struct task_struct *)ctx[1];
    next = (struct task_struct *)ctx[2];
}


And from user-space side you do your detection and before skeleton is loaded:

skel->rodata->has_prev_state = <whatever you detected>

But I'm still hoping that this prev_state argument can be moved to the
end ([0]) to make all this unnecessary.

  [0] https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/


> +
> +       state = get_task_state(prev);
> +
> +       return on_switch(ctx, prev, next, state);
> +}
> +

[...]
Namhyung Kim April 27, 2022, 6:14 p.m. UTC | #2
Hello,

On Tue, Apr 26, 2022 at 4:55 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Fri, Apr 22, 2022 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Recently sched_switch tracepoint added a new argument for prev_state,
> > but it's hard to handle the change in a BPF program.  Instead, we can
> > check the function prototype in BTF before loading the program.
> >
> > Thus I make two copies of the tracepoint handler and select one based
> > on the BTF info.
> >
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/bpf_off_cpu.c          | 32 +++++++++++++++
> >  tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------
> >  2 files changed, 76 insertions(+), 11 deletions(-)
> >
>
> [...]
>
> >
> > +SEC("tp_btf/sched_switch")
> > +int on_switch3(u64 *ctx)
> > +{
> > +       struct task_struct *prev, *next;
> > +       int state;
> > +
> > +       if (!enabled)
> > +               return 0;
> > +
> > +       /*
> > +        * TP_PROTO(bool preempt, struct task_struct *prev,
> > +        *          struct task_struct *next)
> > +        */
> > +       prev = (struct task_struct *)ctx[1];
> > +       next = (struct task_struct *)ctx[2];
>
>
> you don't have to have two BPF programs for this, you can use
> read-only variable to make this choice.
>
> On BPF side
>
> const volatile bool has_prev_state = false;
>
> ...
>
> if (has_prev_state) {
>     prev = (struct task_struct *)ctx[2];
>     next = (struct task_struct *)ctx[3];
> } else {
>     prev = (struct task_struct *)ctx[1];
>     next = (struct task_struct *)ctx[2];
> }
>
>
> And from user-space side you do your detection and before skeleton is loaded:
>
> skel->rodata->has_prev_state = <whatever you detected>

Nice, thanks for the tip!

Actually I tried something similar but it was with a variable (in bss)
so the verifier in an old kernel rejected it due to invalid arg access.

I guess now the const makes the verifier ignore the branch as if
it's dead but the compiler still generates the code, right?

>
> But I'm still hoping that this prev_state argument can be moved to the
> end ([0]) to make all this unnecessary.
>
>   [0] https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/

Yeah, that would make life easier. :)

Thanks,
Namhyung
Andrii Nakryiko April 27, 2022, 7:26 p.m. UTC | #3
On Wed, Apr 27, 2022 at 11:15 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Tue, Apr 26, 2022 at 4:55 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Fri, Apr 22, 2022 at 3:49 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Recently sched_switch tracepoint added a new argument for prev_state,
> > > but it's hard to handle the change in a BPF program.  Instead, we can
> > > check the function prototype in BTF before loading the program.
> > >
> > > Thus I make two copies of the tracepoint handler and select one based
> > > on the BTF info.
> > >
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  tools/perf/util/bpf_off_cpu.c          | 32 +++++++++++++++
> > >  tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------
> > >  2 files changed, 76 insertions(+), 11 deletions(-)
> > >
> >
> > [...]
> >
> > >
> > > +SEC("tp_btf/sched_switch")
> > > +int on_switch3(u64 *ctx)
> > > +{
> > > +       struct task_struct *prev, *next;
> > > +       int state;
> > > +
> > > +       if (!enabled)
> > > +               return 0;
> > > +
> > > +       /*
> > > +        * TP_PROTO(bool preempt, struct task_struct *prev,
> > > +        *          struct task_struct *next)
> > > +        */
> > > +       prev = (struct task_struct *)ctx[1];
> > > +       next = (struct task_struct *)ctx[2];
> >
> >
> > you don't have to have two BPF programs for this, you can use
> > read-only variable to make this choice.
> >
> > On BPF side
> >
> > const volatile bool has_prev_state = false;
> >
> > ...
> >
> > if (has_prev_state) {
> >     prev = (struct task_struct *)ctx[2];
> >     next = (struct task_struct *)ctx[3];
> > } else {
> >     prev = (struct task_struct *)ctx[1];
> >     next = (struct task_struct *)ctx[2];
> > }
> >
> >
> > And from user-space side you do your detection and before skeleton is loaded:
> >
> > skel->rodata->has_prev_state = <whatever you detected>
>
> Nice, thanks for the tip!
>
> Actually I tried something similar but it was with a variable (in bss)
> so the verifier in an old kernel rejected it due to invalid arg access.
>
> I guess now the const makes the verifier ignore the branch as if
> it's dead but the compiler still generates the code, right?


yes, exactly

>
> >
> > But I'm still hoping that this prev_state argument can be moved to the
> > end ([0]) to make all this unnecessary.
> >
> >   [0] https://lore.kernel.org/lkml/93a20759600c05b6d9e4359a1517c88e06b44834.camel@fb.com/
>
> Yeah, that would make life easier. :)
>
> Thanks,
> Namhyung
Namhyung Kim April 28, 2022, 11:58 p.m. UTC | #4
On Wed, Apr 27, 2022 at 12:26 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Apr 27, 2022 at 11:15 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > Actually I tried something similar but it was with a variable (in bss)
> > so the verifier in an old kernel rejected it due to invalid arg access.
> >
> > I guess now the const makes the verifier ignore the branch as if
> > it's dead but the compiler still generates the code, right?
>
>
> yes, exactly

Then I'm curious how it'd work on newer kernels.
The verifier sees the false branch and detects type mismatch
for the second argument then it'd reject the program?

Thanks,
Namhyung
Andrii Nakryiko May 7, 2022, 12:14 a.m. UTC | #5
On Thu, Apr 28, 2022 at 4:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Apr 27, 2022 at 12:26 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Apr 27, 2022 at 11:15 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > Actually I tried something similar but it was with a variable (in bss)
> > > so the verifier in an old kernel rejected it due to invalid arg access.
> > >
> > > I guess now the const makes the verifier ignore the branch as if
> > > it's dead but the compiler still generates the code, right?
> >
> >
> > yes, exactly
>
> Then I'm curious how it'd work on newer kernels.
> The verifier sees the false branch and detects type mismatch
> for the second argument then it'd reject the program?
>

Verifier will know which branch is never taken, and will just ignore
and remove any code in it as dead code.


> Thanks,
> Namhyung
diff mbox series

Patch

diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c
index 89f36229041d..38aeb13d3d25 100644
--- a/tools/perf/util/bpf_off_cpu.c
+++ b/tools/perf/util/bpf_off_cpu.c
@@ -86,6 +86,37 @@  static void off_cpu_finish(void *arg __maybe_unused)
 	off_cpu_bpf__destroy(skel);
 }
 
+/* recent kernel added prev_state arg, so it needs to call the proper function */
+static void check_sched_switch_args(void)
+{
+	const struct btf *btf = bpf_object__btf(skel->obj);
+	const struct btf_type *t1, *t2, *t3;
+	u32 type_id;
+
+	type_id = btf__find_by_name_kind(btf, "bpf_trace_sched_switch",
+					 BTF_KIND_TYPEDEF);
+	if ((s32)type_id < 0)
+		goto old_format;
+
+	t1 = btf__type_by_id(btf, type_id);
+	if (t1 == NULL)
+		goto old_format;
+
+	t2 = btf__type_by_id(btf, t1->type);
+	if (t2 == NULL || !btf_is_ptr(t2))
+		goto old_format;
+
+	t3 = btf__type_by_id(btf, t2->type);
+	if (t3 && btf_is_func_proto(t3) && btf_vlen(t3) == 4) {
+		/* new format: disable old functions */
+		bpf_program__set_autoload(skel->progs.on_switch3, false);
+		return;
+	}
+
+old_format:
+	bpf_program__set_autoload(skel->progs.on_switch4, false);
+}
+
 int off_cpu_prepare(struct evlist *evlist, struct target *target)
 {
 	int err, fd, i;
@@ -114,6 +145,7 @@  int off_cpu_prepare(struct evlist *evlist, struct target *target)
 	}
 
 	set_max_rlimit();
+	check_sched_switch_args();
 
 	err = off_cpu_bpf__load(skel);
 	if (err) {
diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c
index 27425fe361e2..e11e198af86f 100644
--- a/tools/perf/util/bpf_skel/off_cpu.bpf.c
+++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c
@@ -121,22 +121,13 @@  static inline int can_record(struct task_struct *t, int state)
 	return 1;
 }
 
-SEC("tp_btf/sched_switch")
-int on_switch(u64 *ctx)
+static int on_switch(u64 *ctx, struct task_struct *prev,
+		     struct task_struct *next, int state)
 {
 	__u64 ts;
-	int state;
 	__u32 pid, stack_id;
-	struct task_struct *prev, *next;
 	struct tstamp_data elem, *pelem;
 
-	if (!enabled)
-		return 0;
-
-	prev = (struct task_struct *)ctx[1];
-	next = (struct task_struct *)ctx[2];
-	state = get_task_state(prev);
-
 	ts = bpf_ktime_get_ns();
 
 	if (!can_record(prev, state))
@@ -178,4 +169,46 @@  int on_switch(u64 *ctx)
 	return 0;
 }
 
+SEC("tp_btf/sched_switch")
+int on_switch3(u64 *ctx)
+{
+	struct task_struct *prev, *next;
+	int state;
+
+	if (!enabled)
+		return 0;
+
+	/*
+	 * TP_PROTO(bool preempt, struct task_struct *prev,
+	 *          struct task_struct *next)
+	 */
+	prev = (struct task_struct *)ctx[1];
+	next = (struct task_struct *)ctx[2];
+
+	state = get_task_state(prev);
+
+	return on_switch(ctx, prev, next, state);
+}
+
+SEC("tp_btf/sched_switch")
+int on_switch4(u64 *ctx)
+{
+	struct task_struct *prev, *next;
+	int prev_state;
+
+	if (!enabled)
+		return 0;
+
+	/*
+	 * TP_PROTO(bool preempt, int prev_state,
+	 *          struct task_struct *prev,
+	 *          struct task_struct *next)
+	 */
+	prev = (struct task_struct *)ctx[2];
+	next = (struct task_struct *)ctx[3];
+	prev_state = (int)ctx[1];
+
+	return on_switch(ctx, prev, next, prev_state);
+}
+
 char LICENSE[] SEC("license") = "Dual BSD/GPL";