Message ID | 20210726161211.925206-6-andrii@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | BPF perf link and user-provided context value | expand |
On Mon, Jul 26, 2021 at 09:12:02AM -0700, Andrii Nakryiko wrote: > Add ability for users to specify custom u64 value when creating BPF link for > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). If I read this right, the value is dependent on the link, not the program. In which case: > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > index 2d510ad750ed..97ab46802800 100644 > --- a/include/linux/perf_event.h > +++ b/include/linux/perf_event.h > @@ -762,6 +762,7 @@ struct perf_event { > #ifdef CONFIG_BPF_SYSCALL > perf_overflow_handler_t orig_overflow_handler; > struct bpf_prog *prog; > + u64 user_ctx; > #endif > > #ifdef CONFIG_EVENT_TRACING > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > index 8ac92560d3a3..4543852f1480 100644 > --- a/include/linux/trace_events.h > +++ b/include/linux/trace_events.h > @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file) > > #ifdef CONFIG_BPF_EVENTS > unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx); > -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog); > +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx); This API would be misleading, because it is about setting the program. > void perf_event_detach_bpf_prog(struct perf_event *event); > int perf_event_query_prog_array(struct perf_event *event, void __user *info); > int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog); > @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog > } > > event->prog = prog; > + event->user_ctx = user_ctx; > event->orig_overflow_handler = READ_ONCE(event->overflow_handler); > WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); > return 0; Also, the name @user_ctx is a bit confusing. Would something like @bpf_cookie or somesuch not be a better name? Combined would it not make more sense to add something like: extern int perf_event_set_bpf_cookie(struct perf_event *event, u64 cookie);
On Tue, Jul 27, 2021 at 2:14 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Jul 26, 2021 at 09:12:02AM -0700, Andrii Nakryiko wrote: > > Add ability for users to specify custom u64 value when creating BPF link for > > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > > If I read this right, the value is dependent on the link, not the > program. In which case: You can see it both ways. BPF link in this (and at least few other cases) is just this invisible orchestrator of BPF program attachment/detachment. The underlying perf_event subsystem doesn't know about the existence of the BPF link at all. In the end, it's actually struct bpf_prog that is added to perf_event or into tp's bpf_prog_array list, and this user-provided value (bpf cookie per below) is associated with that particular attachment. So when we call trace_call_bpf() from tracepoint or kprobe/uprobe, there is no BPF link anywhere, it's just a list of bpf_prog_array_items, with bpf_prog pointer and associated user value. Note, exactly the same bpf_prog can be attached to another perf_event with a completely different cookie and that's expected and is fine. So in short, perf_event just needs to know about attaching/detaching bpf_prog pointer (and this cookie), it doesn't need to know about bpf_link. Everything is handled the same regardless if bpf_link is used to attach or ioctl(PERF_EVENT_IOC_SET_BPF). > > > diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h > > index 2d510ad750ed..97ab46802800 100644 > > --- a/include/linux/perf_event.h > > +++ b/include/linux/perf_event.h > > @@ -762,6 +762,7 @@ struct perf_event { > > #ifdef CONFIG_BPF_SYSCALL > > perf_overflow_handler_t orig_overflow_handler; > > struct bpf_prog *prog; > > + u64 user_ctx; > > #endif > > > > #ifdef CONFIG_EVENT_TRACING > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > > index 8ac92560d3a3..4543852f1480 100644 > > --- a/include/linux/trace_events.h > > +++ b/include/linux/trace_events.h > > @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file) > > > > #ifdef CONFIG_BPF_EVENTS > > unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx); > > -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog); > > +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx); > > This API would be misleading, because it is about setting the program. Answered above, here perf_event just provides a low-level internal API for attaching bpf_prog with associated value. BPF link is a higher-level invisible concept as far as perf_event is concerned. > > > void perf_event_detach_bpf_prog(struct perf_event *event); > > int perf_event_query_prog_array(struct perf_event *event, void __user *info); > > int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog); > > > @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog > > } > > > > event->prog = prog; > > + event->user_ctx = user_ctx; > > event->orig_overflow_handler = READ_ONCE(event->overflow_handler); > > WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); > > return 0; > > Also, the name @user_ctx is a bit confusing. Would something like > @bpf_cookie or somesuch not be a better name? I struggled to come up with a good name, user_ctx was the best I could do. But I do like bpf_cookie for this, thank you! I'll switch the terminology in the next revision. > > Combined would it not make more sense to add something like: > > extern int perf_event_set_bpf_cookie(struct perf_event *event, u64 cookie); Passing that user_ctx along the bpf_prog makes it clear that they go together and user_ctx is immutable once set. I don't actually plan to allow updating this cookie value. > >
On Tue, Jul 27, 2021 at 02:09:08PM -0700, Andrii Nakryiko wrote: > On Tue, Jul 27, 2021 at 2:14 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Jul 26, 2021 at 09:12:02AM -0700, Andrii Nakryiko wrote: > > > Add ability for users to specify custom u64 value when creating BPF link for > > > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > > > > If I read this right, the value is dependent on the link, not the > > program. In which case: > > You can see it both ways. BPF link in this (and at least few other > cases) is just this invisible orchestrator of BPF program > attachment/detachment. The underlying perf_event subsystem doesn't > know about the existence of the BPF link at all. In the end, it's > actually struct bpf_prog that is added to perf_event or into tp's > bpf_prog_array list, and this user-provided value (bpf cookie per > below) is associated with that particular attachment. So when we call > trace_call_bpf() from tracepoint or kprobe/uprobe, there is no BPF > link anywhere, it's just a list of bpf_prog_array_items, with bpf_prog > pointer and associated user value. Note, exactly the same bpf_prog can > be attached to another perf_event with a completely different cookie > and that's expected and is fine. > > So in short, perf_event just needs to know about attaching/detaching > bpf_prog pointer (and this cookie), it doesn't need to know about > bpf_link. Everything is handled the same regardless if bpf_link is > used to attach or ioctl(PERF_EVENT_IOC_SET_BPF). OK, fair enough I suppose. > > > @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog > > > } > > > > > > event->prog = prog; > > > + event->user_ctx = user_ctx; > > > event->orig_overflow_handler = READ_ONCE(event->overflow_handler); > > > WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); > > > return 0; > > > > Also, the name @user_ctx is a bit confusing. Would something like > > @bpf_cookie or somesuch not be a better name? > > I struggled to come up with a good name, user_ctx was the best I could > do. But I do like bpf_cookie for this, thank you! I'll switch the > terminology in the next revision. y/w :-) Thanks!
On 7/26/21 9:12 AM, Andrii Nakryiko wrote: > Add ability for users to specify custom u64 value when creating BPF link for > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > > This is useful for cases when the same BPF program is used for attaching and > processing invocation of different tracepoints/kprobes/uprobes in a generic > fashion, but such that each invocation is distinguished from each other (e.g., > BPF program can look up additional information associated with a specific > kernel function without having to rely on function IP lookups). This enables > new use cases to be implemented simply and efficiently that previously were > possible only through code generation (and thus multiple instances of almost > identical BPF program) or compilation at runtime (BCC-style) on target hosts > (even more expensive resource-wise). For uprobes it is not even possible in > some cases to know function IP before hand (e.g., when attaching to shared > library without PID filtering, in which case base load address is not known > for a library). > > This is done by storing u64 user_ctx in struct bpf_prog_array_item, > corresponding to each attached and run BPF program. Given cgroup BPF programs > already use 2 8-byte pointers for their needs and cgroup BPF programs don't > have (yet?) support for user_ctx, reuse that space through union of > cgroup_storage and new user_ctx field. > > Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. > This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF > program execution code, which luckily is now also split from > BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper > giving access to this user context value from inside a BPF program. Generic > perf_event BPF programs will access this value from perf_event itself through > passed in BPF program context. > > Cc: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > --- > drivers/media/rc/bpf-lirc.c | 4 ++-- > include/linux/bpf.h | 16 +++++++++++++++- > include/linux/perf_event.h | 1 + > include/linux/trace_events.h | 6 +++--- > include/uapi/linux/bpf.h | 7 +++++++ > kernel/bpf/core.c | 29 ++++++++++++++++++----------- > kernel/bpf/syscall.c | 2 +- > kernel/events/core.c | 21 ++++++++++++++------- > kernel/trace/bpf_trace.c | 8 +++++--- > tools/include/uapi/linux/bpf.h | 7 +++++++ > 10 files changed, 73 insertions(+), 28 deletions(-) > > diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c > index afae0afe3f81..7490494273e4 100644 > --- a/drivers/media/rc/bpf-lirc.c > +++ b/drivers/media/rc/bpf-lirc.c > @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) > goto unlock; > } > > - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); > + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); > if (ret < 0) > goto unlock; > [...] > void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 00b1267ab4f0..bc1fd54a8f58 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -1448,6 +1448,13 @@ union bpf_attr { > __aligned_u64 iter_info; /* extra bpf_iter_link_info */ > __u32 iter_info_len; /* iter_info length */ > }; > + struct { > + /* black box user-provided value passed through > + * to BPF program at the execution time and > + * accessible through bpf_get_user_ctx() BPF helper > + */ > + __u64 user_ctx; > + } perf_event; Is it possible to fold this field into previous union? union { __u32 target_btf_id; /* btf_id of target to attach to */ struct { __aligned_u64 iter_info; /* extra bpf_iter_link_info */ __u32 iter_info_len; /* iter_info length */ }; }; > }; > } link_create; > [...]
On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote: > > > > On 7/26/21 9:12 AM, Andrii Nakryiko wrote: > > Add ability for users to specify custom u64 value when creating BPF link for > > perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > > > > This is useful for cases when the same BPF program is used for attaching and > > processing invocation of different tracepoints/kprobes/uprobes in a generic > > fashion, but such that each invocation is distinguished from each other (e.g., > > BPF program can look up additional information associated with a specific > > kernel function without having to rely on function IP lookups). This enables > > new use cases to be implemented simply and efficiently that previously were > > possible only through code generation (and thus multiple instances of almost > > identical BPF program) or compilation at runtime (BCC-style) on target hosts > > (even more expensive resource-wise). For uprobes it is not even possible in > > some cases to know function IP before hand (e.g., when attaching to shared > > library without PID filtering, in which case base load address is not known > > for a library). > > > > This is done by storing u64 user_ctx in struct bpf_prog_array_item, > > corresponding to each attached and run BPF program. Given cgroup BPF programs > > already use 2 8-byte pointers for their needs and cgroup BPF programs don't > > have (yet?) support for user_ctx, reuse that space through union of > > cgroup_storage and new user_ctx field. > > > > Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. > > This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF > > program execution code, which luckily is now also split from > > BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper > > giving access to this user context value from inside a BPF program. Generic > > perf_event BPF programs will access this value from perf_event itself through > > passed in BPF program context. > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > > --- > > drivers/media/rc/bpf-lirc.c | 4 ++-- > > include/linux/bpf.h | 16 +++++++++++++++- > > include/linux/perf_event.h | 1 + > > include/linux/trace_events.h | 6 +++--- > > include/uapi/linux/bpf.h | 7 +++++++ > > kernel/bpf/core.c | 29 ++++++++++++++++++----------- > > kernel/bpf/syscall.c | 2 +- > > kernel/events/core.c | 21 ++++++++++++++------- > > kernel/trace/bpf_trace.c | 8 +++++--- > > tools/include/uapi/linux/bpf.h | 7 +++++++ > > 10 files changed, 73 insertions(+), 28 deletions(-) > > > > diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c > > index afae0afe3f81..7490494273e4 100644 > > --- a/drivers/media/rc/bpf-lirc.c > > +++ b/drivers/media/rc/bpf-lirc.c > > @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) > > goto unlock; > > } > > > > - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); > > + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); > > if (ret < 0) > > goto unlock; > > > [...] > > void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 00b1267ab4f0..bc1fd54a8f58 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -1448,6 +1448,13 @@ union bpf_attr { > > __aligned_u64 iter_info; /* extra bpf_iter_link_info */ > > __u32 iter_info_len; /* iter_info length */ > > }; > > + struct { > > + /* black box user-provided value passed through > > + * to BPF program at the execution time and > > + * accessible through bpf_get_user_ctx() BPF helper > > + */ > > + __u64 user_ctx; > > + } perf_event; > > Is it possible to fold this field into previous union? > > union { > __u32 target_btf_id; /* btf_id of > target to attach to */ > struct { > __aligned_u64 iter_info; /* > extra bpf_iter_link_info */ > __u32 iter_info_len; /* > iter_info length */ > }; > }; > > I didn't want to do it, because different types of BPF links will accept this user_ctx (or now bpf_cookie). And then we'll have to have different locations of that field for different types of links. For example, when/if we add this user_ctx to BPF iterator programs, having __u64 user_ctx in the same anonymous union will make it overlap with iter_info, which is a problem. So I want to have a link type-specific sections in LINK_CREATE command section, to allow the same field name at different locations. I actually think that we should put iter_info/iter_info_len into a named field, like this (also added user_ctx for bpf_iter link as a demonstration): struct { __aligned_u64 info; __u32 info_len; __aligned_u64 user_ctx; /* see how it's at a different offset than perf_event.user_ctx */ } iter; struct { __u64 user_ctx; } perf_event; (of course keeping already existing fields in anonymous struct for backwards compatibility) I decided to not do that in this patch set, though, to not distract from the main goal. But I think we should avoid this shared field "namespace" across different link types going forward. > > }; > > } link_create; > > > [...]
On 7/29/21 9:31 PM, Andrii Nakryiko wrote: > On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote: >> >> >> >> On 7/26/21 9:12 AM, Andrii Nakryiko wrote: >>> Add ability for users to specify custom u64 value when creating BPF link for >>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). >>> >>> This is useful for cases when the same BPF program is used for attaching and >>> processing invocation of different tracepoints/kprobes/uprobes in a generic >>> fashion, but such that each invocation is distinguished from each other (e.g., >>> BPF program can look up additional information associated with a specific >>> kernel function without having to rely on function IP lookups). This enables >>> new use cases to be implemented simply and efficiently that previously were >>> possible only through code generation (and thus multiple instances of almost >>> identical BPF program) or compilation at runtime (BCC-style) on target hosts >>> (even more expensive resource-wise). For uprobes it is not even possible in >>> some cases to know function IP before hand (e.g., when attaching to shared >>> library without PID filtering, in which case base load address is not known >>> for a library). >>> >>> This is done by storing u64 user_ctx in struct bpf_prog_array_item, >>> corresponding to each attached and run BPF program. Given cgroup BPF programs >>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't >>> have (yet?) support for user_ctx, reuse that space through union of >>> cgroup_storage and new user_ctx field. >>> >>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. >>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF >>> program execution code, which luckily is now also split from >>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper >>> giving access to this user context value from inside a BPF program. Generic >>> perf_event BPF programs will access this value from perf_event itself through >>> passed in BPF program context. >>> >>> Cc: Peter Zijlstra <peterz@infradead.org> >>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> >>> --- >>> drivers/media/rc/bpf-lirc.c | 4 ++-- >>> include/linux/bpf.h | 16 +++++++++++++++- >>> include/linux/perf_event.h | 1 + >>> include/linux/trace_events.h | 6 +++--- >>> include/uapi/linux/bpf.h | 7 +++++++ >>> kernel/bpf/core.c | 29 ++++++++++++++++++----------- >>> kernel/bpf/syscall.c | 2 +- >>> kernel/events/core.c | 21 ++++++++++++++------- >>> kernel/trace/bpf_trace.c | 8 +++++--- >>> tools/include/uapi/linux/bpf.h | 7 +++++++ >>> 10 files changed, 73 insertions(+), 28 deletions(-) >>> >>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c >>> index afae0afe3f81..7490494273e4 100644 >>> --- a/drivers/media/rc/bpf-lirc.c >>> +++ b/drivers/media/rc/bpf-lirc.c >>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) >>> goto unlock; >>> } >>> >>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); >>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); >>> if (ret < 0) >>> goto unlock; >>> >> [...] >>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>> index 00b1267ab4f0..bc1fd54a8f58 100644 >>> --- a/include/uapi/linux/bpf.h >>> +++ b/include/uapi/linux/bpf.h >>> @@ -1448,6 +1448,13 @@ union bpf_attr { >>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */ >>> __u32 iter_info_len; /* iter_info length */ >>> }; >>> + struct { >>> + /* black box user-provided value passed through >>> + * to BPF program at the execution time and >>> + * accessible through bpf_get_user_ctx() BPF helper >>> + */ >>> + __u64 user_ctx; >>> + } perf_event; >> >> Is it possible to fold this field into previous union? >> >> union { >> __u32 target_btf_id; /* btf_id of >> target to attach to */ >> struct { >> __aligned_u64 iter_info; /* >> extra bpf_iter_link_info */ >> __u32 iter_info_len; /* >> iter_info length */ >> }; >> }; >> >> > > I didn't want to do it, because different types of BPF links will > accept this user_ctx (or now bpf_cookie). And then we'll have to have > different locations of that field for different types of links. > > For example, when/if we add this user_ctx to BPF iterator programs, > having __u64 user_ctx in the same anonymous union will make it overlap > with iter_info, which is a problem. So I want to have a link > type-specific sections in LINK_CREATE command section, to allow the > same field name at different locations. > > I actually think that we should put iter_info/iter_info_len into a > named field, like this (also added user_ctx for bpf_iter link as a > demonstration): > > struct { > __aligned_u64 info; > __u32 info_len; > __aligned_u64 user_ctx; /* see how it's at a different offset > than perf_event.user_ctx */ > } iter; > struct { > __u64 user_ctx; > } perf_event; > > (of course keeping already existing fields in anonymous struct for > backwards compatibility) Okay, then since user_ctx may be used by many link types. How about just with the field "user_ctx" without struct perf_event. Sometime like __u64 user_ctx; instead of struct { __u64 user_ctx; } perf_event; > > I decided to not do that in this patch set, though, to not distract > from the main goal. But I think we should avoid this shared field > "namespace" across different link types going forward. > > >>> }; >>> } link_create; >>> >> [...]
On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 7/29/21 9:31 PM, Andrii Nakryiko wrote: > > On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote: > >> > >> > >> > >> On 7/26/21 9:12 AM, Andrii Nakryiko wrote: > >>> Add ability for users to specify custom u64 value when creating BPF link for > >>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > >>> > >>> This is useful for cases when the same BPF program is used for attaching and > >>> processing invocation of different tracepoints/kprobes/uprobes in a generic > >>> fashion, but such that each invocation is distinguished from each other (e.g., > >>> BPF program can look up additional information associated with a specific > >>> kernel function without having to rely on function IP lookups). This enables > >>> new use cases to be implemented simply and efficiently that previously were > >>> possible only through code generation (and thus multiple instances of almost > >>> identical BPF program) or compilation at runtime (BCC-style) on target hosts > >>> (even more expensive resource-wise). For uprobes it is not even possible in > >>> some cases to know function IP before hand (e.g., when attaching to shared > >>> library without PID filtering, in which case base load address is not known > >>> for a library). > >>> > >>> This is done by storing u64 user_ctx in struct bpf_prog_array_item, > >>> corresponding to each attached and run BPF program. Given cgroup BPF programs > >>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't > >>> have (yet?) support for user_ctx, reuse that space through union of > >>> cgroup_storage and new user_ctx field. > >>> > >>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. > >>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF > >>> program execution code, which luckily is now also split from > >>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper > >>> giving access to this user context value from inside a BPF program. Generic > >>> perf_event BPF programs will access this value from perf_event itself through > >>> passed in BPF program context. > >>> > >>> Cc: Peter Zijlstra <peterz@infradead.org> > >>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > >>> --- > >>> drivers/media/rc/bpf-lirc.c | 4 ++-- > >>> include/linux/bpf.h | 16 +++++++++++++++- > >>> include/linux/perf_event.h | 1 + > >>> include/linux/trace_events.h | 6 +++--- > >>> include/uapi/linux/bpf.h | 7 +++++++ > >>> kernel/bpf/core.c | 29 ++++++++++++++++++----------- > >>> kernel/bpf/syscall.c | 2 +- > >>> kernel/events/core.c | 21 ++++++++++++++------- > >>> kernel/trace/bpf_trace.c | 8 +++++--- > >>> tools/include/uapi/linux/bpf.h | 7 +++++++ > >>> 10 files changed, 73 insertions(+), 28 deletions(-) > >>> > >>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c > >>> index afae0afe3f81..7490494273e4 100644 > >>> --- a/drivers/media/rc/bpf-lirc.c > >>> +++ b/drivers/media/rc/bpf-lirc.c > >>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) > >>> goto unlock; > >>> } > >>> > >>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); > >>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); > >>> if (ret < 0) > >>> goto unlock; > >>> > >> [...] > >>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); > >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>> index 00b1267ab4f0..bc1fd54a8f58 100644 > >>> --- a/include/uapi/linux/bpf.h > >>> +++ b/include/uapi/linux/bpf.h > >>> @@ -1448,6 +1448,13 @@ union bpf_attr { > >>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */ > >>> __u32 iter_info_len; /* iter_info length */ > >>> }; > >>> + struct { > >>> + /* black box user-provided value passed through > >>> + * to BPF program at the execution time and > >>> + * accessible through bpf_get_user_ctx() BPF helper > >>> + */ > >>> + __u64 user_ctx; > >>> + } perf_event; > >> > >> Is it possible to fold this field into previous union? > >> > >> union { > >> __u32 target_btf_id; /* btf_id of > >> target to attach to */ > >> struct { > >> __aligned_u64 iter_info; /* > >> extra bpf_iter_link_info */ > >> __u32 iter_info_len; /* > >> iter_info length */ > >> }; > >> }; > >> > >> > > > > I didn't want to do it, because different types of BPF links will > > accept this user_ctx (or now bpf_cookie). And then we'll have to have > > different locations of that field for different types of links. > > > > For example, when/if we add this user_ctx to BPF iterator programs, > > having __u64 user_ctx in the same anonymous union will make it overlap > > with iter_info, which is a problem. So I want to have a link > > type-specific sections in LINK_CREATE command section, to allow the > > same field name at different locations. > > > > I actually think that we should put iter_info/iter_info_len into a > > named field, like this (also added user_ctx for bpf_iter link as a > > demonstration): > > > > struct { > > __aligned_u64 info; > > __u32 info_len; > > __aligned_u64 user_ctx; /* see how it's at a different offset > > than perf_event.user_ctx */ > > } iter; > > struct { > > __u64 user_ctx; > > } perf_event; > > > > (of course keeping already existing fields in anonymous struct for > > backwards compatibility) > > Okay, then since user_ctx may be used by many link types. How > about just with the field "user_ctx" without struct perf_event. I'd love to do it because it is indeed generic and common field, like target_fd. But I'm not sure what you are proposing below. Where exactly that user_ctx (now called bpf_cookie) goes in your example? I see few possible options that allow preserving ABI backwards compatibility. Let's see if you and everyone else likes any of those better. I'll use the full LINK_CREATE sub-struct definition from bpf_attr to make it clear. And to demonstrate how this can be extended to bpf_iter in the future, please note this part as this is an important aspect. 1. Full backwards compatibility and per-link type sections (my current approach): struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; union { __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; struct { __u64 bpf_cookie; } perf_event; struct { __aligned_u64 info; __u32 info_len; __aligned_u64 bpf_cookie; } iter; }; } link_create; The good property here is that we can keep easily extending link type-specific sections with extra fields where needed. For common stuff like bpf_cookie it's suboptimal because we'll need to duplicate field definition in each struct inside that union, but I think that's fine. From end-user point of view, they will know which type of link they are creating, so the use will be straightforward. This is why I went with this approach. But let's consider alternatives. 2. Non-backwards compatible layout but extra flag to specify that new field layout is used. struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; /* this will start supporting some new flag like BPF_F_LINK_CREATE_NEW */ __u64 bpf_cookie; /* common field now */ union { /* this parts is effectively deprecated now */ __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; struct { /* this is new layout, but needs BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */ __u64 bpf_cookie; union { struct { __u32 target_btf_id; } ext; struct { __aligned_u64 info; __u32 info_len; } iter; } } }; } link_create; This makes bpf_cookie a common field, but at least for EXT (freplace/) and ITER (bpf_iter/) links we need to specify extra flag to specify that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter then will use iter.info and iter.info_len, and can use plain bpf_cookie. IMO, this is way too confusing and a maintainability nightmare. I'm trying to guess what you are proposing, I can read it two ways, but let me know if I missed something. 3. Just add bpf_cookie field before link type-specific section. struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; __u64 bpf_cookie; // <<<<<<<<<< HERE union { __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; }; } link_create; This looks really nice and would be great, but that changes offsets for target_btf_id/iter_info/iter_info_len, so a no go. The only way to rectify this is what proposal #2 above does with an extra flag. 4. Add bpf_cookie after link-type specific part: struct { /* struct used by BPF_LINK_CREATE command */ __u32 prog_fd; union { __u32 target_fd; __u32 target_ifindex; }; __u32 attach_type; __u32 flags; union { __u32 target_btf_id; struct { __aligned_u64 iter_info; __u32 iter_info_len; }; struct { }; __u64 bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE } link_create; This could work. But we are wasting 16 bytes currently used for target_btf_id/iter_info/iter_info_len. If we later need to do something link type-specific, we can add it to the existing union if we need <= 16 bytes, otherwise we'll need to start another union after bpf_cookie, splitting this into two link type-specific sections. Overall, this might work, especially assuming we won't need to extend iter-specific portions. But I really hate that we didn't do named structs inside that union (i.e., ext.target_btf_id and iter.info/iter.info_len) and I'd like to rectify that in the follow up patches with named structs duplicating existing field layout, but with proper naming. But splitting this LINK_CREATE bpf_attr part into two unions would make it hard and awkward in the future. So, thoughts? Did you have something else in mind that I missed? > Sometime like > > __u64 user_ctx; > > instead of > > struct { > __u64 user_ctx; > } perf_event; > > > > > I decided to not do that in this patch set, though, to not distract > > from the main goal. But I think we should avoid this shared field > > "namespace" across different link types going forward. > > > > > >>> }; > >>> } link_create; > >>> > >> [...]
On 7/30/21 10:48 AM, Andrii Nakryiko wrote: > On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote: >> >> >> >> On 7/29/21 9:31 PM, Andrii Nakryiko wrote: >>> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote: >>>> >>>> >>>> >>>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote: >>>>> Add ability for users to specify custom u64 value when creating BPF link for >>>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). >>>>> >>>>> This is useful for cases when the same BPF program is used for attaching and >>>>> processing invocation of different tracepoints/kprobes/uprobes in a generic >>>>> fashion, but such that each invocation is distinguished from each other (e.g., >>>>> BPF program can look up additional information associated with a specific >>>>> kernel function without having to rely on function IP lookups). This enables >>>>> new use cases to be implemented simply and efficiently that previously were >>>>> possible only through code generation (and thus multiple instances of almost >>>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts >>>>> (even more expensive resource-wise). For uprobes it is not even possible in >>>>> some cases to know function IP before hand (e.g., when attaching to shared >>>>> library without PID filtering, in which case base load address is not known >>>>> for a library). >>>>> >>>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item, >>>>> corresponding to each attached and run BPF program. Given cgroup BPF programs >>>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't >>>>> have (yet?) support for user_ctx, reuse that space through union of >>>>> cgroup_storage and new user_ctx field. >>>>> >>>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. >>>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF >>>>> program execution code, which luckily is now also split from >>>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper >>>>> giving access to this user context value from inside a BPF program. Generic >>>>> perf_event BPF programs will access this value from perf_event itself through >>>>> passed in BPF program context. >>>>> >>>>> Cc: Peter Zijlstra <peterz@infradead.org> >>>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> >>>>> --- >>>>> drivers/media/rc/bpf-lirc.c | 4 ++-- >>>>> include/linux/bpf.h | 16 +++++++++++++++- >>>>> include/linux/perf_event.h | 1 + >>>>> include/linux/trace_events.h | 6 +++--- >>>>> include/uapi/linux/bpf.h | 7 +++++++ >>>>> kernel/bpf/core.c | 29 ++++++++++++++++++----------- >>>>> kernel/bpf/syscall.c | 2 +- >>>>> kernel/events/core.c | 21 ++++++++++++++------- >>>>> kernel/trace/bpf_trace.c | 8 +++++--- >>>>> tools/include/uapi/linux/bpf.h | 7 +++++++ >>>>> 10 files changed, 73 insertions(+), 28 deletions(-) >>>>> >>>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c >>>>> index afae0afe3f81..7490494273e4 100644 >>>>> --- a/drivers/media/rc/bpf-lirc.c >>>>> +++ b/drivers/media/rc/bpf-lirc.c >>>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) >>>>> goto unlock; >>>>> } >>>>> >>>>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); >>>>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); >>>>> if (ret < 0) >>>>> goto unlock; >>>>> >>>> [...] >>>>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>> index 00b1267ab4f0..bc1fd54a8f58 100644 >>>>> --- a/include/uapi/linux/bpf.h >>>>> +++ b/include/uapi/linux/bpf.h >>>>> @@ -1448,6 +1448,13 @@ union bpf_attr { >>>>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */ >>>>> __u32 iter_info_len; /* iter_info length */ >>>>> }; >>>>> + struct { >>>>> + /* black box user-provided value passed through >>>>> + * to BPF program at the execution time and >>>>> + * accessible through bpf_get_user_ctx() BPF helper >>>>> + */ >>>>> + __u64 user_ctx; >>>>> + } perf_event; >>>> >>>> Is it possible to fold this field into previous union? >>>> >>>> union { >>>> __u32 target_btf_id; /* btf_id of >>>> target to attach to */ >>>> struct { >>>> __aligned_u64 iter_info; /* >>>> extra bpf_iter_link_info */ >>>> __u32 iter_info_len; /* >>>> iter_info length */ >>>> }; >>>> }; >>>> >>>> >>> >>> I didn't want to do it, because different types of BPF links will >>> accept this user_ctx (or now bpf_cookie). And then we'll have to have >>> different locations of that field for different types of links. >>> >>> For example, when/if we add this user_ctx to BPF iterator programs, >>> having __u64 user_ctx in the same anonymous union will make it overlap >>> with iter_info, which is a problem. So I want to have a link >>> type-specific sections in LINK_CREATE command section, to allow the >>> same field name at different locations. >>> >>> I actually think that we should put iter_info/iter_info_len into a >>> named field, like this (also added user_ctx for bpf_iter link as a >>> demonstration): >>> >>> struct { >>> __aligned_u64 info; >>> __u32 info_len; >>> __aligned_u64 user_ctx; /* see how it's at a different offset >>> than perf_event.user_ctx */ >>> } iter; >>> struct { >>> __u64 user_ctx; >>> } perf_event; >>> >>> (of course keeping already existing fields in anonymous struct for >>> backwards compatibility) >> >> Okay, then since user_ctx may be used by many link types. How >> about just with the field "user_ctx" without struct perf_event. > > I'd love to do it because it is indeed generic and common field, like > target_fd. But I'm not sure what you are proposing below. Where > exactly that user_ctx (now called bpf_cookie) goes in your example? I > see few possible options that allow preserving ABI backwards > compatibility. Let's see if you and everyone else likes any of those > better. I'll use the full LINK_CREATE sub-struct definition from > bpf_attr to make it clear. And to demonstrate how this can be extended > to bpf_iter in the future, please note this part as this is an > important aspect. > > 1. Full backwards compatibility and per-link type sections (my current > approach): > > struct { /* struct used by BPF_LINK_CREATE command */ > __u32 prog_fd; > union { > __u32 target_fd; > __u32 target_ifindex; > }; > __u32 attach_type; > __u32 flags; > union { > __u32 target_btf_id; > struct { > __aligned_u64 iter_info; > __u32 iter_info_len; > }; > struct { > __u64 bpf_cookie; > } perf_event; > struct { > __aligned_u64 info; > __u32 info_len; > __aligned_u64 bpf_cookie; > } iter; > }; > } link_create; > > The good property here is that we can keep easily extending link > type-specific sections with extra fields where needed. For common > stuff like bpf_cookie it's suboptimal because we'll need to duplicate > field definition in each struct inside that union, but I think that's > fine. From end-user point of view, they will know which type of link > they are creating, so the use will be straightforward. This is why I > went with this approach. But let's consider alternatives. > > 2. Non-backwards compatible layout but extra flag to specify that new > field layout is used. > > struct { /* struct used by BPF_LINK_CREATE command */ > __u32 prog_fd; > union { > __u32 target_fd; > __u32 target_ifindex; > }; > __u32 attach_type; > __u32 flags; /* this will start supporting > some new flag like BPF_F_LINK_CREATE_NEW */ > __u64 bpf_cookie; /* common field now */ > union { /* this parts is effectively deprecated now */ > __u32 target_btf_id; > struct { > __aligned_u64 iter_info; > __u32 iter_info_len; > }; > struct { /* this is new layout, but needs > BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */ > __u64 bpf_cookie; > union { > struct { > __u32 target_btf_id; > } ext; > struct { > __aligned_u64 info; > __u32 info_len; > } iter; > } > } > }; > } link_create; > > This makes bpf_cookie a common field, but at least for EXT (freplace/) > and ITER (bpf_iter/) links we need to specify extra flag to specify > that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter > then will use iter.info and iter.info_len, and can use plain > bpf_cookie. > > IMO, this is way too confusing and a maintainability nightmare. > > I'm trying to guess what you are proposing, I can read it two ways, > but let me know if I missed something. > > 3. Just add bpf_cookie field before link type-specific section. > > struct { /* struct used by BPF_LINK_CREATE command */ > __u32 prog_fd; > union { > __u32 target_fd; > __u32 target_ifindex; > }; > __u32 attach_type; > __u32 flags; > __u64 bpf_cookie; // <<<<<<<<<< HERE > union { > __u32 target_btf_id; > struct { > __aligned_u64 iter_info; > __u32 iter_info_len; > }; > }; > } link_create; > > This looks really nice and would be great, but that changes offsets > for target_btf_id/iter_info/iter_info_len, so a no go. The only way to > rectify this is what proposal #2 above does with an extra flag. > > 4. Add bpf_cookie after link-type specific part: > > struct { /* struct used by BPF_LINK_CREATE command */ > __u32 prog_fd; > union { > __u32 target_fd; > __u32 target_ifindex; > }; > __u32 attach_type; > __u32 flags; > union { > __u32 target_btf_id; > struct { > __aligned_u64 iter_info; > __u32 iter_info_len; > }; > struct { > }; > __u64 bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE > } link_create; > > This could work. But we are wasting 16 bytes currently used for > target_btf_id/iter_info/iter_info_len. If we later need to do > something link type-specific, we can add it to the existing union if > we need <= 16 bytes, otherwise we'll need to start another union after > bpf_cookie, splitting this into two link type-specific sections. > > Overall, this might work, especially assuming we won't need to extend > iter-specific portions. But I really hate that we didn't do named > structs inside that union (i.e., ext.target_btf_id and > iter.info/iter.info_len) and I'd like to rectify that in the follow up > patches with named structs duplicating existing field layout, but with > proper naming. But splitting this LINK_CREATE bpf_attr part into two > unions would make it hard and awkward in the future. > > So, thoughts? Did you have something else in mind that I missed? What I proposed is your option 4. Yes, in the future if there is there are something we want to add to bpf iter, we can add to iter_info, so it should not be an issue. Any other new link_type may utilized the same union with struct { __aligned_u64 new_type_info; __u32 new_type_info_len; }; and this will put extensibility into new_type_info. I know this may be a little bit hassle but it should work. Your option 1 should work too, which is what I proposed in the beginning to put into the union and we can feel free to add bpf_cookie for each individual link type. This is actually cleaner. > > >> Sometime like >> >> __u64 user_ctx; >> >> instead of >> >> struct { >> __u64 user_ctx; >> } perf_event; >> >>> >>> I decided to not do that in this patch set, though, to not distract >>> from the main goal. But I think we should avoid this shared field >>> "namespace" across different link types going forward. >>> >>> >>>>> }; >>>>> } link_create; >>>>> >>>> [...]
On Fri, Jul 30, 2021 at 2:34 PM Yonghong Song <yhs@fb.com> wrote: > > > > On 7/30/21 10:48 AM, Andrii Nakryiko wrote: > > On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote: > >> > >> > >> > >> On 7/29/21 9:31 PM, Andrii Nakryiko wrote: > >>> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote: > >>>> > >>>> > >>>> > >>>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote: > >>>>> Add ability for users to specify custom u64 value when creating BPF link for > >>>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). > >>>>> > >>>>> This is useful for cases when the same BPF program is used for attaching and > >>>>> processing invocation of different tracepoints/kprobes/uprobes in a generic > >>>>> fashion, but such that each invocation is distinguished from each other (e.g., > >>>>> BPF program can look up additional information associated with a specific > >>>>> kernel function without having to rely on function IP lookups). This enables > >>>>> new use cases to be implemented simply and efficiently that previously were > >>>>> possible only through code generation (and thus multiple instances of almost > >>>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts > >>>>> (even more expensive resource-wise). For uprobes it is not even possible in > >>>>> some cases to know function IP before hand (e.g., when attaching to shared > >>>>> library without PID filtering, in which case base load address is not known > >>>>> for a library). > >>>>> > >>>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item, > >>>>> corresponding to each attached and run BPF program. Given cgroup BPF programs > >>>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't > >>>>> have (yet?) support for user_ctx, reuse that space through union of > >>>>> cgroup_storage and new user_ctx field. > >>>>> > >>>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. > >>>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF > >>>>> program execution code, which luckily is now also split from > >>>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper > >>>>> giving access to this user context value from inside a BPF program. Generic > >>>>> perf_event BPF programs will access this value from perf_event itself through > >>>>> passed in BPF program context. > >>>>> > >>>>> Cc: Peter Zijlstra <peterz@infradead.org> > >>>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> > >>>>> --- > >>>>> drivers/media/rc/bpf-lirc.c | 4 ++-- > >>>>> include/linux/bpf.h | 16 +++++++++++++++- > >>>>> include/linux/perf_event.h | 1 + > >>>>> include/linux/trace_events.h | 6 +++--- > >>>>> include/uapi/linux/bpf.h | 7 +++++++ > >>>>> kernel/bpf/core.c | 29 ++++++++++++++++++----------- > >>>>> kernel/bpf/syscall.c | 2 +- > >>>>> kernel/events/core.c | 21 ++++++++++++++------- > >>>>> kernel/trace/bpf_trace.c | 8 +++++--- > >>>>> tools/include/uapi/linux/bpf.h | 7 +++++++ > >>>>> 10 files changed, 73 insertions(+), 28 deletions(-) > >>>>> > >>>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c > >>>>> index afae0afe3f81..7490494273e4 100644 > >>>>> --- a/drivers/media/rc/bpf-lirc.c > >>>>> +++ b/drivers/media/rc/bpf-lirc.c > >>>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) > >>>>> goto unlock; > >>>>> } > >>>>> > >>>>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); > >>>>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); > >>>>> if (ret < 0) > >>>>> goto unlock; > >>>>> > >>>> [...] > >>>>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); > >>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > >>>>> index 00b1267ab4f0..bc1fd54a8f58 100644 > >>>>> --- a/include/uapi/linux/bpf.h > >>>>> +++ b/include/uapi/linux/bpf.h > >>>>> @@ -1448,6 +1448,13 @@ union bpf_attr { > >>>>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */ > >>>>> __u32 iter_info_len; /* iter_info length */ > >>>>> }; > >>>>> + struct { > >>>>> + /* black box user-provided value passed through > >>>>> + * to BPF program at the execution time and > >>>>> + * accessible through bpf_get_user_ctx() BPF helper > >>>>> + */ > >>>>> + __u64 user_ctx; > >>>>> + } perf_event; > >>>> > >>>> Is it possible to fold this field into previous union? > >>>> > >>>> union { > >>>> __u32 target_btf_id; /* btf_id of > >>>> target to attach to */ > >>>> struct { > >>>> __aligned_u64 iter_info; /* > >>>> extra bpf_iter_link_info */ > >>>> __u32 iter_info_len; /* > >>>> iter_info length */ > >>>> }; > >>>> }; > >>>> > >>>> > >>> > >>> I didn't want to do it, because different types of BPF links will > >>> accept this user_ctx (or now bpf_cookie). And then we'll have to have > >>> different locations of that field for different types of links. > >>> > >>> For example, when/if we add this user_ctx to BPF iterator programs, > >>> having __u64 user_ctx in the same anonymous union will make it overlap > >>> with iter_info, which is a problem. So I want to have a link > >>> type-specific sections in LINK_CREATE command section, to allow the > >>> same field name at different locations. > >>> > >>> I actually think that we should put iter_info/iter_info_len into a > >>> named field, like this (also added user_ctx for bpf_iter link as a > >>> demonstration): > >>> > >>> struct { > >>> __aligned_u64 info; > >>> __u32 info_len; > >>> __aligned_u64 user_ctx; /* see how it's at a different offset > >>> than perf_event.user_ctx */ > >>> } iter; > >>> struct { > >>> __u64 user_ctx; > >>> } perf_event; > >>> > >>> (of course keeping already existing fields in anonymous struct for > >>> backwards compatibility) > >> > >> Okay, then since user_ctx may be used by many link types. How > >> about just with the field "user_ctx" without struct perf_event. > > > > I'd love to do it because it is indeed generic and common field, like > > target_fd. But I'm not sure what you are proposing below. Where > > exactly that user_ctx (now called bpf_cookie) goes in your example? I > > see few possible options that allow preserving ABI backwards > > compatibility. Let's see if you and everyone else likes any of those > > better. I'll use the full LINK_CREATE sub-struct definition from > > bpf_attr to make it clear. And to demonstrate how this can be extended > > to bpf_iter in the future, please note this part as this is an > > important aspect. > > > > 1. Full backwards compatibility and per-link type sections (my current > > approach): > > > > struct { /* struct used by BPF_LINK_CREATE command */ > > __u32 prog_fd; > > union { > > __u32 target_fd; > > __u32 target_ifindex; > > }; > > __u32 attach_type; > > __u32 flags; > > union { > > __u32 target_btf_id; > > struct { > > __aligned_u64 iter_info; > > __u32 iter_info_len; > > }; > > struct { > > __u64 bpf_cookie; > > } perf_event; > > struct { > > __aligned_u64 info; > > __u32 info_len; > > __aligned_u64 bpf_cookie; > > } iter; > > }; > > } link_create; > > > > The good property here is that we can keep easily extending link > > type-specific sections with extra fields where needed. For common > > stuff like bpf_cookie it's suboptimal because we'll need to duplicate > > field definition in each struct inside that union, but I think that's > > fine. From end-user point of view, they will know which type of link > > they are creating, so the use will be straightforward. This is why I > > went with this approach. But let's consider alternatives. > > > > 2. Non-backwards compatible layout but extra flag to specify that new > > field layout is used. > > > > struct { /* struct used by BPF_LINK_CREATE command */ > > __u32 prog_fd; > > union { > > __u32 target_fd; > > __u32 target_ifindex; > > }; > > __u32 attach_type; > > __u32 flags; /* this will start supporting > > some new flag like BPF_F_LINK_CREATE_NEW */ > > __u64 bpf_cookie; /* common field now */ > > union { /* this parts is effectively deprecated now */ > > __u32 target_btf_id; > > struct { > > __aligned_u64 iter_info; > > __u32 iter_info_len; > > }; > > struct { /* this is new layout, but needs > > BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */ > > __u64 bpf_cookie; > > union { > > struct { > > __u32 target_btf_id; > > } ext; > > struct { > > __aligned_u64 info; > > __u32 info_len; > > } iter; > > } > > } > > }; > > } link_create; > > > > This makes bpf_cookie a common field, but at least for EXT (freplace/) > > and ITER (bpf_iter/) links we need to specify extra flag to specify > > that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter > > then will use iter.info and iter.info_len, and can use plain > > bpf_cookie. > > > > IMO, this is way too confusing and a maintainability nightmare. > > > > I'm trying to guess what you are proposing, I can read it two ways, > > but let me know if I missed something. > > > > 3. Just add bpf_cookie field before link type-specific section. > > > > struct { /* struct used by BPF_LINK_CREATE command */ > > __u32 prog_fd; > > union { > > __u32 target_fd; > > __u32 target_ifindex; > > }; > > __u32 attach_type; > > __u32 flags; > > __u64 bpf_cookie; // <<<<<<<<<< HERE > > union { > > __u32 target_btf_id; > > struct { > > __aligned_u64 iter_info; > > __u32 iter_info_len; > > }; > > }; > > } link_create; > > > > This looks really nice and would be great, but that changes offsets > > for target_btf_id/iter_info/iter_info_len, so a no go. The only way to > > rectify this is what proposal #2 above does with an extra flag. > > > > 4. Add bpf_cookie after link-type specific part: > > > > struct { /* struct used by BPF_LINK_CREATE command */ > > __u32 prog_fd; > > union { > > __u32 target_fd; > > __u32 target_ifindex; > > }; > > __u32 attach_type; > > __u32 flags; > > union { > > __u32 target_btf_id; > > struct { > > __aligned_u64 iter_info; > > __u32 iter_info_len; > > }; > > struct { > > }; > > __u64 bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE > > } link_create; > > > > This could work. But we are wasting 16 bytes currently used for > > target_btf_id/iter_info/iter_info_len. If we later need to do > > something link type-specific, we can add it to the existing union if > > we need <= 16 bytes, otherwise we'll need to start another union after > > bpf_cookie, splitting this into two link type-specific sections. > > > > Overall, this might work, especially assuming we won't need to extend > > iter-specific portions. But I really hate that we didn't do named > > structs inside that union (i.e., ext.target_btf_id and > > iter.info/iter.info_len) and I'd like to rectify that in the follow up > > patches with named structs duplicating existing field layout, but with > > proper naming. But splitting this LINK_CREATE bpf_attr part into two > > unions would make it hard and awkward in the future. > > > > So, thoughts? Did you have something else in mind that I missed? > > What I proposed is your option 4. Yes, in the future if there is there > are something we want to add to bpf iter, we can add to iter_info, so > it should not be an issue. Any other new link_type may utilized the same > union with > struct { > __aligned_u64 new_type_info; > __u32 new_type_info_len; > }; > and this will put extensibility into new_type_info. > I know this may be a little bit hassle but it should work. > I see what you mean. With this extra pointer we shouldn't need more than 16 bytes per link type. That's unnecessary complication for a lot of simpler types of links, unfortunately, though definitely an option. We could have also done approach #4 but maybe leave 16-32 bytes before bpf_cookie for the union, so that it's much less likely that we'll run out of space there. Not very clean either, so I don't know. I'll keep it here for discussion for now, let's see if anyone has strong preferences and opinions. > Your option 1 should work too, which is what I proposed in the beginning > to put into the union and we can feel free to add bpf_cookie for each > individual link type. This is actually cleaner. Oh, you did? I must have misunderstood then. If you like approach #1, then it's what I'm doing right now, so let's keep it as is and let's see if anyone else has preferences. > > > > > > >> Sometime like > >> > >> __u64 user_ctx; > >> > >> instead of > >> > >> struct { > >> __u64 user_ctx; > >> } perf_event; > >> > >>> > >>> I decided to not do that in this patch set, though, to not distract > >>> from the main goal. But I think we should avoid this shared field > >>> "namespace" across different link types going forward. > >>> > >>> > >>>>> }; > >>>>> } link_create; > >>>>> > >>>> [...]
On 7/30/21 3:06 PM, Andrii Nakryiko wrote: > On Fri, Jul 30, 2021 at 2:34 PM Yonghong Song <yhs@fb.com> wrote: >> >> >> >> On 7/30/21 10:48 AM, Andrii Nakryiko wrote: >>> On Thu, Jul 29, 2021 at 10:49 PM Yonghong Song <yhs@fb.com> wrote: >>>> >>>> >>>> >>>> On 7/29/21 9:31 PM, Andrii Nakryiko wrote: >>>>> On Thu, Jul 29, 2021 at 11:00 AM Yonghong Song <yhs@fb.com> wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 7/26/21 9:12 AM, Andrii Nakryiko wrote: >>>>>>> Add ability for users to specify custom u64 value when creating BPF link for >>>>>>> perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). >>>>>>> >>>>>>> This is useful for cases when the same BPF program is used for attaching and >>>>>>> processing invocation of different tracepoints/kprobes/uprobes in a generic >>>>>>> fashion, but such that each invocation is distinguished from each other (e.g., >>>>>>> BPF program can look up additional information associated with a specific >>>>>>> kernel function without having to rely on function IP lookups). This enables >>>>>>> new use cases to be implemented simply and efficiently that previously were >>>>>>> possible only through code generation (and thus multiple instances of almost >>>>>>> identical BPF program) or compilation at runtime (BCC-style) on target hosts >>>>>>> (even more expensive resource-wise). For uprobes it is not even possible in >>>>>>> some cases to know function IP before hand (e.g., when attaching to shared >>>>>>> library without PID filtering, in which case base load address is not known >>>>>>> for a library). >>>>>>> >>>>>>> This is done by storing u64 user_ctx in struct bpf_prog_array_item, >>>>>>> corresponding to each attached and run BPF program. Given cgroup BPF programs >>>>>>> already use 2 8-byte pointers for their needs and cgroup BPF programs don't >>>>>>> have (yet?) support for user_ctx, reuse that space through union of >>>>>>> cgroup_storage and new user_ctx field. >>>>>>> >>>>>>> Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. >>>>>>> This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF >>>>>>> program execution code, which luckily is now also split from >>>>>>> BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper >>>>>>> giving access to this user context value from inside a BPF program. Generic >>>>>>> perf_event BPF programs will access this value from perf_event itself through >>>>>>> passed in BPF program context. >>>>>>> >>>>>>> Cc: Peter Zijlstra <peterz@infradead.org> >>>>>>> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> >>>>>>> --- >>>>>>> drivers/media/rc/bpf-lirc.c | 4 ++-- >>>>>>> include/linux/bpf.h | 16 +++++++++++++++- >>>>>>> include/linux/perf_event.h | 1 + >>>>>>> include/linux/trace_events.h | 6 +++--- >>>>>>> include/uapi/linux/bpf.h | 7 +++++++ >>>>>>> kernel/bpf/core.c | 29 ++++++++++++++++++----------- >>>>>>> kernel/bpf/syscall.c | 2 +- >>>>>>> kernel/events/core.c | 21 ++++++++++++++------- >>>>>>> kernel/trace/bpf_trace.c | 8 +++++--- >>>>>>> tools/include/uapi/linux/bpf.h | 7 +++++++ >>>>>>> 10 files changed, 73 insertions(+), 28 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c >>>>>>> index afae0afe3f81..7490494273e4 100644 >>>>>>> --- a/drivers/media/rc/bpf-lirc.c >>>>>>> +++ b/drivers/media/rc/bpf-lirc.c >>>>>>> @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) >>>>>>> goto unlock; >>>>>>> } >>>>>>> >>>>>>> - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); >>>>>>> + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); >>>>>>> if (ret < 0) >>>>>>> goto unlock; >>>>>>> >>>>>> [...] >>>>>>> void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); >>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h >>>>>>> index 00b1267ab4f0..bc1fd54a8f58 100644 >>>>>>> --- a/include/uapi/linux/bpf.h >>>>>>> +++ b/include/uapi/linux/bpf.h >>>>>>> @@ -1448,6 +1448,13 @@ union bpf_attr { >>>>>>> __aligned_u64 iter_info; /* extra bpf_iter_link_info */ >>>>>>> __u32 iter_info_len; /* iter_info length */ >>>>>>> }; >>>>>>> + struct { >>>>>>> + /* black box user-provided value passed through >>>>>>> + * to BPF program at the execution time and >>>>>>> + * accessible through bpf_get_user_ctx() BPF helper >>>>>>> + */ >>>>>>> + __u64 user_ctx; >>>>>>> + } perf_event; >>>>>> >>>>>> Is it possible to fold this field into previous union? >>>>>> >>>>>> union { >>>>>> __u32 target_btf_id; /* btf_id of >>>>>> target to attach to */ >>>>>> struct { >>>>>> __aligned_u64 iter_info; /* >>>>>> extra bpf_iter_link_info */ >>>>>> __u32 iter_info_len; /* >>>>>> iter_info length */ >>>>>> }; >>>>>> }; >>>>>> >>>>>> >>>>> >>>>> I didn't want to do it, because different types of BPF links will >>>>> accept this user_ctx (or now bpf_cookie). And then we'll have to have >>>>> different locations of that field for different types of links. >>>>> >>>>> For example, when/if we add this user_ctx to BPF iterator programs, >>>>> having __u64 user_ctx in the same anonymous union will make it overlap >>>>> with iter_info, which is a problem. So I want to have a link >>>>> type-specific sections in LINK_CREATE command section, to allow the >>>>> same field name at different locations. >>>>> >>>>> I actually think that we should put iter_info/iter_info_len into a >>>>> named field, like this (also added user_ctx for bpf_iter link as a >>>>> demonstration): >>>>> >>>>> struct { >>>>> __aligned_u64 info; >>>>> __u32 info_len; >>>>> __aligned_u64 user_ctx; /* see how it's at a different offset >>>>> than perf_event.user_ctx */ >>>>> } iter; >>>>> struct { >>>>> __u64 user_ctx; >>>>> } perf_event; >>>>> >>>>> (of course keeping already existing fields in anonymous struct for >>>>> backwards compatibility) >>>> >>>> Okay, then since user_ctx may be used by many link types. How >>>> about just with the field "user_ctx" without struct perf_event. >>> >>> I'd love to do it because it is indeed generic and common field, like >>> target_fd. But I'm not sure what you are proposing below. Where >>> exactly that user_ctx (now called bpf_cookie) goes in your example? I >>> see few possible options that allow preserving ABI backwards >>> compatibility. Let's see if you and everyone else likes any of those >>> better. I'll use the full LINK_CREATE sub-struct definition from >>> bpf_attr to make it clear. And to demonstrate how this can be extended >>> to bpf_iter in the future, please note this part as this is an >>> important aspect. >>> >>> 1. Full backwards compatibility and per-link type sections (my current >>> approach): >>> >>> struct { /* struct used by BPF_LINK_CREATE command */ >>> __u32 prog_fd; >>> union { >>> __u32 target_fd; >>> __u32 target_ifindex; >>> }; >>> __u32 attach_type; >>> __u32 flags; >>> union { >>> __u32 target_btf_id; >>> struct { >>> __aligned_u64 iter_info; >>> __u32 iter_info_len; >>> }; >>> struct { >>> __u64 bpf_cookie; >>> } perf_event; >>> struct { >>> __aligned_u64 info; >>> __u32 info_len; >>> __aligned_u64 bpf_cookie; >>> } iter; >>> }; >>> } link_create; >>> >>> The good property here is that we can keep easily extending link >>> type-specific sections with extra fields where needed. For common >>> stuff like bpf_cookie it's suboptimal because we'll need to duplicate >>> field definition in each struct inside that union, but I think that's >>> fine. From end-user point of view, they will know which type of link >>> they are creating, so the use will be straightforward. This is why I >>> went with this approach. But let's consider alternatives. >>> >>> 2. Non-backwards compatible layout but extra flag to specify that new >>> field layout is used. >>> >>> struct { /* struct used by BPF_LINK_CREATE command */ >>> __u32 prog_fd; >>> union { >>> __u32 target_fd; >>> __u32 target_ifindex; >>> }; >>> __u32 attach_type; >>> __u32 flags; /* this will start supporting >>> some new flag like BPF_F_LINK_CREATE_NEW */ >>> __u64 bpf_cookie; /* common field now */ >>> union { /* this parts is effectively deprecated now */ >>> __u32 target_btf_id; >>> struct { >>> __aligned_u64 iter_info; >>> __u32 iter_info_len; >>> }; >>> struct { /* this is new layout, but needs >>> BPF_F_LINK_CREATE_NEW, at least for ext/ and bpf_iter/ programs */ >>> __u64 bpf_cookie; >>> union { >>> struct { >>> __u32 target_btf_id; >>> } ext; >>> struct { >>> __aligned_u64 info; >>> __u32 info_len; >>> } iter; >>> } >>> } >>> }; >>> } link_create; >>> >>> This makes bpf_cookie a common field, but at least for EXT (freplace/) >>> and ITER (bpf_iter/) links we need to specify extra flag to specify >>> that we are not using iter_info/iter_info_len/target_btf_id. bpf_iter >>> then will use iter.info and iter.info_len, and can use plain >>> bpf_cookie. >>> >>> IMO, this is way too confusing and a maintainability nightmare. >>> >>> I'm trying to guess what you are proposing, I can read it two ways, >>> but let me know if I missed something. >>> >>> 3. Just add bpf_cookie field before link type-specific section. >>> >>> struct { /* struct used by BPF_LINK_CREATE command */ >>> __u32 prog_fd; >>> union { >>> __u32 target_fd; >>> __u32 target_ifindex; >>> }; >>> __u32 attach_type; >>> __u32 flags; >>> __u64 bpf_cookie; // <<<<<<<<<< HERE >>> union { >>> __u32 target_btf_id; >>> struct { >>> __aligned_u64 iter_info; >>> __u32 iter_info_len; >>> }; >>> }; >>> } link_create; >>> >>> This looks really nice and would be great, but that changes offsets >>> for target_btf_id/iter_info/iter_info_len, so a no go. The only way to >>> rectify this is what proposal #2 above does with an extra flag. >>> >>> 4. Add bpf_cookie after link-type specific part: >>> >>> struct { /* struct used by BPF_LINK_CREATE command */ >>> __u32 prog_fd; >>> union { >>> __u32 target_fd; >>> __u32 target_ifindex; >>> }; >>> __u32 attach_type; >>> __u32 flags; >>> union { >>> __u32 target_btf_id; >>> struct { >>> __aligned_u64 iter_info; >>> __u32 iter_info_len; >>> }; >>> struct { >>> }; >>> __u64 bpf_cookie; // <<<<<<<<<<<<<<<<<< HERE >>> } link_create; >>> >>> This could work. But we are wasting 16 bytes currently used for >>> target_btf_id/iter_info/iter_info_len. If we later need to do >>> something link type-specific, we can add it to the existing union if >>> we need <= 16 bytes, otherwise we'll need to start another union after >>> bpf_cookie, splitting this into two link type-specific sections. >>> >>> Overall, this might work, especially assuming we won't need to extend >>> iter-specific portions. But I really hate that we didn't do named >>> structs inside that union (i.e., ext.target_btf_id and >>> iter.info/iter.info_len) and I'd like to rectify that in the follow up >>> patches with named structs duplicating existing field layout, but with >>> proper naming. But splitting this LINK_CREATE bpf_attr part into two >>> unions would make it hard and awkward in the future. >>> >>> So, thoughts? Did you have something else in mind that I missed? >> >> What I proposed is your option 4. Yes, in the future if there is there >> are something we want to add to bpf iter, we can add to iter_info, so >> it should not be an issue. Any other new link_type may utilized the same >> union with >> struct { >> __aligned_u64 new_type_info; >> __u32 new_type_info_len; >> }; >> and this will put extensibility into new_type_info. >> I know this may be a little bit hassle but it should work. >> > > I see what you mean. With this extra pointer we shouldn't need more > than 16 bytes per link type. That's unnecessary complication for a lot > of simpler types of links, unfortunately, though definitely an option. > > We could have also done approach #4 but maybe leave 16-32 bytes before > bpf_cookie for the union, so that it's much less likely that we'll run > out of space there. Not very clean either, so I don't know. > > I'll keep it here for discussion for now, let's see if anyone has > strong preferences and opinions. > >> Your option 1 should work too, which is what I proposed in the beginning >> to put into the union and we can feel free to add bpf_cookie for each >> individual link type. This is actually cleaner. > > Oh, you did? I must have misunderstood then. If you like approach #1, > then it's what I'm doing right now, so let's keep it as is and let's > see if anyone else has preferences. Just checked old emails. It is actually my misunderstanding. I probably mismatched "{" and "}" and thought you placed outside the union and made the suggestion. So never mind, we are on the same page :-) > >> >>> >>> >>>> Sometime like >>>> >>>> __u64 user_ctx; >>>> >>>> instead of >>>> >>>> struct { >>>> __u64 user_ctx; >>>> } perf_event; >>>> >>>>> >>>>> I decided to not do that in this patch set, though, to not distract >>>>> from the main goal. But I think we should avoid this shared field >>>>> "namespace" across different link types going forward. >>>>> >>>>> >>>>>>> }; >>>>>>> } link_create; >>>>>>> >>>>>> [...]
diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c index afae0afe3f81..7490494273e4 100644 --- a/drivers/media/rc/bpf-lirc.c +++ b/drivers/media/rc/bpf-lirc.c @@ -160,7 +160,7 @@ static int lirc_bpf_attach(struct rc_dev *rcdev, struct bpf_prog *prog) goto unlock; } - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); + ret = bpf_prog_array_copy(old_array, NULL, prog, 0, &new_array); if (ret < 0) goto unlock; @@ -193,7 +193,7 @@ static int lirc_bpf_detach(struct rc_dev *rcdev, struct bpf_prog *prog) } old_array = lirc_rcu_dereference(raw->progs); - ret = bpf_prog_array_copy(old_array, prog, NULL, &new_array); + ret = bpf_prog_array_copy(old_array, prog, NULL, 0, &new_array); /* * Do not use bpf_prog_array_delete_safe() as we would end up * with a dummy entry in the array, and the we would free the diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9c44b56b698f..74b35faf0b73 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1114,7 +1114,10 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, */ struct bpf_prog_array_item { struct bpf_prog *prog; - struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; + union { + struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; + u64 user_ctx; + }; }; struct bpf_prog_array { @@ -1140,6 +1143,7 @@ int bpf_prog_array_copy_info(struct bpf_prog_array *array, int bpf_prog_array_copy(struct bpf_prog_array *old_array, struct bpf_prog *exclude_prog, struct bpf_prog *include_prog, + u64 include_user_ctx, struct bpf_prog_array **new_array); struct bpf_run_ctx {}; @@ -1149,6 +1153,11 @@ struct bpf_cg_run_ctx { const struct bpf_prog_array_item *prog_item; }; +struct bpf_trace_run_ctx { + struct bpf_run_ctx run_ctx; + u64 user_ctx; +}; + #ifdef CONFIG_BPF_SYSCALL static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx) { @@ -1247,6 +1256,8 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu, const struct bpf_prog_array_item *item; const struct bpf_prog *prog; const struct bpf_prog_array *array; + struct bpf_run_ctx *old_run_ctx; + struct bpf_trace_run_ctx run_ctx; u32 ret = 1; migrate_disable(); @@ -1254,11 +1265,14 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu, array = rcu_dereference(array_rcu); if (unlikely(!array)) goto out; + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) { + run_ctx.user_ctx = item->user_ctx; ret &= run_prog(prog, ctx); item++; } + bpf_reset_run_ctx(old_run_ctx); out: rcu_read_unlock(); migrate_enable(); diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 2d510ad750ed..97ab46802800 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -762,6 +762,7 @@ struct perf_event { #ifdef CONFIG_BPF_SYSCALL perf_overflow_handler_t orig_overflow_handler; struct bpf_prog *prog; + u64 user_ctx; #endif #ifdef CONFIG_EVENT_TRACING diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8ac92560d3a3..4543852f1480 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -675,7 +675,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file) #ifdef CONFIG_BPF_EVENTS unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx); -int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog); +int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx); void perf_event_detach_bpf_prog(struct perf_event *event); int perf_event_query_prog_array(struct perf_event *event, void __user *info); int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog); @@ -692,7 +692,7 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c } static inline int -perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog) +perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx) { return -EOPNOTSUPP; } @@ -803,7 +803,7 @@ extern void ftrace_profile_free_filter(struct perf_event *event); void perf_trace_buf_update(void *record, u16 type); void *perf_trace_buf_alloc(int size, struct pt_regs **regs, int *rctxp); -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog); +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 user_ctx); void perf_event_free_bpf_prog(struct perf_event *event); void bpf_trace_run1(struct bpf_prog *prog, u64 arg1); diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 00b1267ab4f0..bc1fd54a8f58 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -1448,6 +1448,13 @@ union bpf_attr { __aligned_u64 iter_info; /* extra bpf_iter_link_info */ __u32 iter_info_len; /* iter_info length */ }; + struct { + /* black box user-provided value passed through + * to BPF program at the execution time and + * accessible through bpf_get_user_ctx() BPF helper + */ + __u64 user_ctx; + } perf_event; }; } link_create; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 9b1577498373..7e4c8bf3e8d1 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -2097,13 +2097,13 @@ int bpf_prog_array_update_at(struct bpf_prog_array *array, int index, int bpf_prog_array_copy(struct bpf_prog_array *old_array, struct bpf_prog *exclude_prog, struct bpf_prog *include_prog, + u64 include_user_ctx, struct bpf_prog_array **new_array) { int new_prog_cnt, carry_prog_cnt = 0; - struct bpf_prog_array_item *existing; + struct bpf_prog_array_item *existing, *new; struct bpf_prog_array *array; bool found_exclude = false; - int new_prog_idx = 0; /* Figure out how many existing progs we need to carry over to * the new array. @@ -2140,20 +2140,27 @@ int bpf_prog_array_copy(struct bpf_prog_array *old_array, array = bpf_prog_array_alloc(new_prog_cnt + 1, GFP_KERNEL); if (!array) return -ENOMEM; + new = array->items; /* Fill in the new prog array */ if (carry_prog_cnt) { existing = old_array->items; - for (; existing->prog; existing++) - if (existing->prog != exclude_prog && - existing->prog != &dummy_bpf_prog.prog) { - array->items[new_prog_idx++].prog = - existing->prog; - } + for (; existing->prog; existing++) { + if (existing->prog == exclude_prog || + existing->prog == &dummy_bpf_prog.prog) + continue; + + new->prog = existing->prog; + new->user_ctx = existing->user_ctx; + new++; + } } - if (include_prog) - array->items[new_prog_idx++].prog = include_prog; - array->items[new_prog_idx].prog = NULL; + if (include_prog) { + new->prog = include_prog; + new->user_ctx = include_user_ctx; + new++; + } + new->prog = NULL; *new_array = array; return 0; } diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 80c03bedd6e6..67f82d053935 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2963,7 +2963,7 @@ static int bpf_perf_link_attach(const union bpf_attr *attr, struct bpf_prog *pro } event = perf_file->private_data; - err = perf_event_set_bpf_prog(event, prog); + err = perf_event_set_bpf_prog(event, prog, attr->link_create.perf_event.user_ctx); if (err) { bpf_link_cleanup(&link_primer); goto out_put_file; diff --git a/kernel/events/core.c b/kernel/events/core.c index b125943599ce..3dcdf58290eb 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5643,7 +5643,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon if (IS_ERR(prog)) return PTR_ERR(prog); - err = perf_event_set_bpf_prog(event, prog); + err = perf_event_set_bpf_prog(event, prog, 0); if (err) { bpf_prog_put(prog); return err; @@ -9936,7 +9936,9 @@ static void bpf_overflow_handler(struct perf_event *event, event->orig_overflow_handler(event, data, regs); } -static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog) +static int perf_event_set_bpf_handler(struct perf_event *event, + struct bpf_prog *prog, + u64 user_ctx) { if (event->overflow_handler_context) /* hw breakpoint or kernel counter */ @@ -9966,6 +9968,7 @@ static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog } event->prog = prog; + event->user_ctx = user_ctx; event->orig_overflow_handler = READ_ONCE(event->overflow_handler); WRITE_ONCE(event->overflow_handler, bpf_overflow_handler); return 0; @@ -9983,7 +9986,9 @@ static void perf_event_free_bpf_handler(struct perf_event *event) bpf_prog_put(prog); } #else -static int perf_event_set_bpf_handler(struct perf_event *event, struct bpf_prog *prog) +static int perf_event_set_bpf_handler(struct perf_event *event, + struct bpf_prog *prog, + u64 user_ctx) { return -EOPNOTSUPP; } @@ -10011,12 +10016,13 @@ static inline bool perf_event_is_tracing(struct perf_event *event) return false; } -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog) +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, + u64 user_ctx) { bool is_kprobe, is_tracepoint, is_syscall_tp; if (!perf_event_is_tracing(event)) - return perf_event_set_bpf_handler(event, prog); + return perf_event_set_bpf_handler(event, prog, user_ctx); is_kprobe = event->tp_event->flags & TRACE_EVENT_FL_UKPROBE; is_tracepoint = event->tp_event->flags & TRACE_EVENT_FL_TRACEPOINT; @@ -10042,7 +10048,7 @@ int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog) return -EACCES; } - return perf_event_attach_bpf_prog(event, prog); + return perf_event_attach_bpf_prog(event, prog, user_ctx); } void perf_event_free_bpf_prog(struct perf_event *event) @@ -10064,7 +10070,8 @@ static void perf_event_free_filter(struct perf_event *event) { } -int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog) +int perf_event_set_bpf_prog(struct perf_event *event, struct bpf_prog *prog, + u64 user_ctx) { return -ENOENT; } diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b427eac10780..c9cf6a0d0fb3 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -1674,7 +1674,8 @@ static DEFINE_MUTEX(bpf_event_mutex); #define BPF_TRACE_MAX_PROGS 64 int perf_event_attach_bpf_prog(struct perf_event *event, - struct bpf_prog *prog) + struct bpf_prog *prog, + u64 user_ctx) { struct bpf_prog_array *old_array; struct bpf_prog_array *new_array; @@ -1701,12 +1702,13 @@ int perf_event_attach_bpf_prog(struct perf_event *event, goto unlock; } - ret = bpf_prog_array_copy(old_array, NULL, prog, &new_array); + ret = bpf_prog_array_copy(old_array, NULL, prog, user_ctx, &new_array); if (ret < 0) goto unlock; /* set the new array to event->tp_event and set event->prog */ event->prog = prog; + event->user_ctx = user_ctx; rcu_assign_pointer(event->tp_event->prog_array, new_array); bpf_prog_array_free(old_array); @@ -1727,7 +1729,7 @@ void perf_event_detach_bpf_prog(struct perf_event *event) goto unlock; old_array = bpf_event_rcu_dereference(event->tp_event->prog_array); - ret = bpf_prog_array_copy(old_array, event->prog, NULL, &new_array); + ret = bpf_prog_array_copy(old_array, event->prog, NULL, 0, &new_array); if (ret == -ENOENT) goto unlock; if (ret < 0) { diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 00b1267ab4f0..bc1fd54a8f58 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -1448,6 +1448,13 @@ union bpf_attr { __aligned_u64 iter_info; /* extra bpf_iter_link_info */ __u32 iter_info_len; /* iter_info length */ }; + struct { + /* black box user-provided value passed through + * to BPF program at the execution time and + * accessible through bpf_get_user_ctx() BPF helper + */ + __u64 user_ctx; + } perf_event; }; } link_create;
Add ability for users to specify custom u64 value when creating BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event, tracepoints). This is useful for cases when the same BPF program is used for attaching and processing invocation of different tracepoints/kprobes/uprobes in a generic fashion, but such that each invocation is distinguished from each other (e.g., BPF program can look up additional information associated with a specific kernel function without having to rely on function IP lookups). This enables new use cases to be implemented simply and efficiently that previously were possible only through code generation (and thus multiple instances of almost identical BPF program) or compilation at runtime (BCC-style) on target hosts (even more expensive resource-wise). For uprobes it is not even possible in some cases to know function IP before hand (e.g., when attaching to shared library without PID filtering, in which case base load address is not known for a library). This is done by storing u64 user_ctx in struct bpf_prog_array_item, corresponding to each attached and run BPF program. Given cgroup BPF programs already use 2 8-byte pointers for their needs and cgroup BPF programs don't have (yet?) support for user_ctx, reuse that space through union of cgroup_storage and new user_ctx field. Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx. This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF program execution code, which luckily is now also split from BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper giving access to this user context value from inside a BPF program. Generic perf_event BPF programs will access this value from perf_event itself through passed in BPF program context. Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> --- drivers/media/rc/bpf-lirc.c | 4 ++-- include/linux/bpf.h | 16 +++++++++++++++- include/linux/perf_event.h | 1 + include/linux/trace_events.h | 6 +++--- include/uapi/linux/bpf.h | 7 +++++++ kernel/bpf/core.c | 29 ++++++++++++++++++----------- kernel/bpf/syscall.c | 2 +- kernel/events/core.c | 21 ++++++++++++++------- kernel/trace/bpf_trace.c | 8 +++++--- tools/include/uapi/linux/bpf.h | 7 +++++++ 10 files changed, 73 insertions(+), 28 deletions(-)