Message ID | 20240713044645.10840-1-khuey@kylehuey.com (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | perf/bpf: Don't call bpf_overflow_handler() for tracing events | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Fri, Jul 12, 2024 at 09:46:45PM -0700, Kyle Huey wrote: > The regressing commit is new in 6.10. It assumed that anytime event->prog > is set bpf_overflow_handler() should be invoked to execute the attached bpf > program. This assumption is false for tracing events, and as a result the > regressing commit broke bpftrace by invoking the bpf handler with garbage > inputs on overflow. > > Prior to the regression the overflow handlers formed a chain (of length 0, > 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added > bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog() > (the tracing case) did not. Both set event->prog. The chain of overflow > handlers was replaced by a single overflow handler slot and a fixed call to > bpf_overflow_handler() when appropriate. This modifies the condition there > to include !perf_event_is_tracing(), restoring the previous behavior and > fixing bpftrace. > > Signed-off-by: Kyle Huey <khuey@kylehuey.com> > Reported-by: Joe Damato <jdamato@fastly.com> > Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery") > Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace > Tested-by: Kyle Huey <khuey@kylehuey.com> # bpf overflow handlers > --- > kernel/events/core.c | 11 ++++++++++- > 1 file changed, 10 insertions(+), 1 deletion(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 8f908f077935..f0d7119585dc 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9666,6 +9666,8 @@ static inline void perf_event_free_bpf_handler(struct perf_event *event) > * Generic event overflow handling, sampling. > */ > > +static bool perf_event_is_tracing(struct perf_event *event); > + > static int __perf_event_overflow(struct perf_event *event, > int throttle, struct perf_sample_data *data, > struct pt_regs *regs) > @@ -9682,7 +9684,9 @@ static int __perf_event_overflow(struct perf_event *event, > > ret = __perf_event_account_interrupt(event, throttle); > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > + if (event->prog && > + !perf_event_is_tracing(event) && > + !bpf_overflow_handler(event, data, regs)) > return ret; ok makes sense, it's better to follow the perf_event_set_bpf_prog condition Reviewed-by: Jiri Olsa <jolsa@kernel.org> jirka > > /* > @@ -10612,6 +10616,11 @@ void perf_event_free_bpf_prog(struct perf_event *event) > > #else > > +static inline bool perf_event_is_tracing(struct perf_event *event) > +{ > + return false; > +} > + > static inline void perf_tp_register(void) > { > } > -- > 2.34.1 >
On Sat, Jul 13, 2024 at 10:32:07PM +0200, Jiri Olsa wrote: > On Fri, Jul 12, 2024 at 09:46:45PM -0700, Kyle Huey wrote: > > The regressing commit is new in 6.10. It assumed that anytime event->prog > > is set bpf_overflow_handler() should be invoked to execute the attached bpf > > program. This assumption is false for tracing events, and as a result the > > regressing commit broke bpftrace by invoking the bpf handler with garbage > > inputs on overflow. > > > > Prior to the regression the overflow handlers formed a chain (of length 0, > > 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added > > bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog() > > (the tracing case) did not. Both set event->prog. The chain of overflow > > handlers was replaced by a single overflow handler slot and a fixed call to > > bpf_overflow_handler() when appropriate. This modifies the condition there > > to include !perf_event_is_tracing(), restoring the previous behavior and > > fixing bpftrace. > > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com> > > Reported-by: Joe Damato <jdamato@fastly.com> > > Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery") > > Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace > > Tested-by: Kyle Huey <khuey@kylehuey.com> # bpf overflow handlers > > --- > > kernel/events/core.c | 11 ++++++++++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 8f908f077935..f0d7119585dc 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -9666,6 +9666,8 @@ static inline void perf_event_free_bpf_handler(struct perf_event *event) > > * Generic event overflow handling, sampling. > > */ > > > > +static bool perf_event_is_tracing(struct perf_event *event); > > + > > static int __perf_event_overflow(struct perf_event *event, > > int throttle, struct perf_sample_data *data, > > struct pt_regs *regs) > > @@ -9682,7 +9684,9 @@ static int __perf_event_overflow(struct perf_event *event, > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > + if (event->prog && > > + !perf_event_is_tracing(event) && > > + !bpf_overflow_handler(event, data, regs)) > > return ret; > > ok makes sense, it's better to follow the perf_event_set_bpf_prog condition > > Reviewed-by: Jiri Olsa <jolsa@kernel.org> Urgh, so wth does event_is_tracing do with event->prog? And can't we clean this up? That whole perf_event_is_tracing() is a pretty gross function. Also, I think the default return value of bpf_overflow_handler() is wrong -- note how if !event->prog we won't call bpf_overflow_handler(), but if we do call it, but then have !event->prog on the re-read, we still return 0.
On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Sat, Jul 13, 2024 at 10:32:07PM +0200, Jiri Olsa wrote: > > On Fri, Jul 12, 2024 at 09:46:45PM -0700, Kyle Huey wrote: > > > The regressing commit is new in 6.10. It assumed that anytime event->prog > > > is set bpf_overflow_handler() should be invoked to execute the attached bpf > > > program. This assumption is false for tracing events, and as a result the > > > regressing commit broke bpftrace by invoking the bpf handler with garbage > > > inputs on overflow. > > > > > > Prior to the regression the overflow handlers formed a chain (of length 0, > > > 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added > > > bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog() > > > (the tracing case) did not. Both set event->prog. The chain of overflow > > > handlers was replaced by a single overflow handler slot and a fixed call to > > > bpf_overflow_handler() when appropriate. This modifies the condition there > > > to include !perf_event_is_tracing(), restoring the previous behavior and > > > fixing bpftrace. > > > > > > Signed-off-by: Kyle Huey <khuey@kylehuey.com> > > > Reported-by: Joe Damato <jdamato@fastly.com> > > > Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery") > > > Tested-by: Joe Damato <jdamato@fastly.com> # bpftrace > > > Tested-by: Kyle Huey <khuey@kylehuey.com> # bpf overflow handlers > > > --- > > > kernel/events/core.c | 11 ++++++++++- > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > index 8f908f077935..f0d7119585dc 100644 > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -9666,6 +9666,8 @@ static inline void perf_event_free_bpf_handler(struct perf_event *event) > > > * Generic event overflow handling, sampling. > > > */ > > > > > > +static bool perf_event_is_tracing(struct perf_event *event); > > > + > > > static int __perf_event_overflow(struct perf_event *event, > > > int throttle, struct perf_sample_data *data, > > > struct pt_regs *regs) > > > @@ -9682,7 +9684,9 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > > + if (event->prog && > > > + !perf_event_is_tracing(event) && > > > + !bpf_overflow_handler(event, data, regs)) > > > return ret; > > > > ok makes sense, it's better to follow the perf_event_set_bpf_prog condition > > > > Reviewed-by: Jiri Olsa <jolsa@kernel.org> > > Urgh, so wth does event_is_tracing do with event->prog? And can't we > clean this up? Tracing events keep track of the bpf program in event->prog solely for cleanup. The bpf programs are stored in and invoked from event->tp_event->prog_array, but when the event is destroyed it needs to know which bpf program to remove from that array. > That whole perf_event_is_tracing() is a pretty gross function. > > Also, I think the default return value of bpf_overflow_handler() is > wrong -- note how if !event->prog we won't call bpf_overflow_handler(), > but if we do call it, but then have !event->prog on the re-read, we > still return 0. The synchronization model here isn't quite clear to me but I don't think this matters in practice. Once event->prog is set the only allowed change is for it to be cleared when the perf event is freed. Anything else is refused by perf_event_set_bpf_handler() with EEXIST. Can that free race with an overflow handler? I'm not sure, but even if it can, dropping an overflow for an event that's being freed seems fine to me. If it can't race then we could remove the condition on the re-read entirely. - Kyle
On Mon, Jul 15, 2024 at 07:33:57AM -0700, Kyle Huey wrote: > On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@infradead.org> wrote: > > Urgh, so wth does event_is_tracing do with event->prog? And can't we > > clean this up? > > Tracing events keep track of the bpf program in event->prog solely for > cleanup. The bpf programs are stored in and invoked from > event->tp_event->prog_array, but when the event is destroyed it needs > to know which bpf program to remove from that array. Yeah, figured it out eventually.. Does look like it needs event->prog and we can't easily remedy this dual use :/ > > That whole perf_event_is_tracing() is a pretty gross function. > > > > Also, I think the default return value of bpf_overflow_handler() is > > wrong -- note how if !event->prog we won't call bpf_overflow_handler(), > > but if we do call it, but then have !event->prog on the re-read, we > > still return 0. > > The synchronization model here isn't quite clear to me but I don't > think this matters in practice. Once event->prog is set the only > allowed change is for it to be cleared when the perf event is freed. > Anything else is refused by perf_event_set_bpf_handler() with EEXIST. > Can that free race with an overflow handler? I'm not sure, but even if > it can, dropping an overflow for an event that's being freed seems > fine to me. If it can't race then we could remove the condition on the > re-read entirely. Right, also rcu_read_lock() is cheap enough to unconditionally do I'm thinking. So since we have two distinct users of event->prog, I figured we could distinguish them from one of the LSB in the pointer value, which then got me the below. But now that I see the end result I'm not at all sure this is sane. But I figure it ought to work... --- diff --git a/kernel/events/core.c b/kernel/events/core.c index ab6c4c942f79..5ec78346c2a1 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9594,6 +9594,13 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r } #ifdef CONFIG_BPF_SYSCALL + +static inline struct bpf_prog *event_prog(struct perf_event *event) +{ + unsigned long _prog = (unsigned long)READ_ONCE(event->prog); + return (void *)(_prog & ~1); +} + static int bpf_overflow_handler(struct perf_event *event, struct perf_sample_data *data, struct pt_regs *regs) @@ -9603,19 +9610,21 @@ static int bpf_overflow_handler(struct perf_event *event, .event = event, }; struct bpf_prog *prog; - int ret = 0; + int ret = 1; + + guard(rcu)(); - ctx.regs = perf_arch_bpf_user_pt_regs(regs); - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) - goto out; - rcu_read_lock(); prog = READ_ONCE(event->prog); - if (prog) { + if (!((unsigned long)prog & 1)) + return ret; + + prog = (void *)((unsigned long)prog & ~1); + + if (unlikely(__this_cpu_inc_return(bpf_prog_active) == 1)) { perf_prepare_sample(data, event, regs); + ctx.regs = perf_arch_bpf_user_pt_regs(regs); ret = bpf_prog_run(prog, &ctx); } - rcu_read_unlock(); -out: __this_cpu_dec(bpf_prog_active); return ret; @@ -9652,14 +9661,14 @@ static inline int perf_event_set_bpf_handler(struct perf_event *event, return -EPROTO; } - event->prog = prog; + event->prog = (void *)((unsigned long)prog | 1); event->bpf_cookie = bpf_cookie; return 0; } static inline void perf_event_free_bpf_handler(struct perf_event *event) { - struct bpf_prog *prog = event->prog; + struct bpf_prog *prog = event_prog(event); if (!prog) return; @@ -9707,7 +9716,7 @@ static int __perf_event_overflow(struct perf_event *event, ret = __perf_event_account_interrupt(event, throttle); - if (event->prog && !bpf_overflow_handler(event, data, regs)) + if (!bpf_overflow_handler(event, data, regs)) return ret; /* @@ -12026,10 +12035,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, context = parent_event->overflow_handler_context; #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING) if (parent_event->prog) { - struct bpf_prog *prog = parent_event->prog; + struct bpf_prog *prog = event_prog(parent_event); bpf_prog_inc(prog); - event->prog = prog; + event->prog = parent_event->prog; } #endif }
On Mon, Jul 15, 2024 at 8:04 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Jul 15, 2024 at 07:33:57AM -0700, Kyle Huey wrote: > > On Mon, Jul 15, 2024 at 4:12 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > Urgh, so wth does event_is_tracing do with event->prog? And can't we > > > clean this up? > > > > Tracing events keep track of the bpf program in event->prog solely for > > cleanup. The bpf programs are stored in and invoked from > > event->tp_event->prog_array, but when the event is destroyed it needs > > to know which bpf program to remove from that array. > > Yeah, figured it out eventually.. Does look like it needs event->prog > and we can't easily remedy this dual use :/ > > > > That whole perf_event_is_tracing() is a pretty gross function. > > > > > > Also, I think the default return value of bpf_overflow_handler() is > > > wrong -- note how if !event->prog we won't call bpf_overflow_handler(), > > > but if we do call it, but then have !event->prog on the re-read, we > > > still return 0. > > > > The synchronization model here isn't quite clear to me but I don't > > think this matters in practice. Once event->prog is set the only > > allowed change is for it to be cleared when the perf event is freed. > > Anything else is refused by perf_event_set_bpf_handler() with EEXIST. > > Can that free race with an overflow handler? I'm not sure, but even if > > it can, dropping an overflow for an event that's being freed seems > > fine to me. If it can't race then we could remove the condition on the > > re-read entirely. > > Right, also rcu_read_lock() is cheap enough to unconditionally do I'm > thinking. > > So since we have two distinct users of event->prog, I figured we could > distinguish them from one of the LSB in the pointer value, which then > got me the below. > > But now that I see the end result I'm not at all sure this is sane. > > But I figure it ought to work... I think this would probably work but stealing the bit seems far more complicated than just gating on perf_event_is_tracing(). Would it assuage your concerns at all if I made event->prog a simple union between say handler_prog and sample_prog (still discriminated by perf_event_is_tracing() where necessary) with appropriate comments and changed the two code paths accordingly? - Kyle > --- > diff --git a/kernel/events/core.c b/kernel/events/core.c > index ab6c4c942f79..5ec78346c2a1 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9594,6 +9594,13 @@ static inline bool sample_is_allowed(struct perf_event *event, struct pt_regs *r > } > > #ifdef CONFIG_BPF_SYSCALL > + > +static inline struct bpf_prog *event_prog(struct perf_event *event) > +{ > + unsigned long _prog = (unsigned long)READ_ONCE(event->prog); > + return (void *)(_prog & ~1); > +} > + > static int bpf_overflow_handler(struct perf_event *event, > struct perf_sample_data *data, > struct pt_regs *regs) > @@ -9603,19 +9610,21 @@ static int bpf_overflow_handler(struct perf_event *event, > .event = event, > }; > struct bpf_prog *prog; > - int ret = 0; > + int ret = 1; > + > + guard(rcu)(); > > - ctx.regs = perf_arch_bpf_user_pt_regs(regs); > - if (unlikely(__this_cpu_inc_return(bpf_prog_active) != 1)) > - goto out; > - rcu_read_lock(); > prog = READ_ONCE(event->prog); > - if (prog) { > + if (!((unsigned long)prog & 1)) > + return ret; > + > + prog = (void *)((unsigned long)prog & ~1); > + > + if (unlikely(__this_cpu_inc_return(bpf_prog_active) == 1)) { > perf_prepare_sample(data, event, regs); > + ctx.regs = perf_arch_bpf_user_pt_regs(regs); > ret = bpf_prog_run(prog, &ctx); > } > - rcu_read_unlock(); > -out: > __this_cpu_dec(bpf_prog_active); > > return ret; > @@ -9652,14 +9661,14 @@ static inline int perf_event_set_bpf_handler(struct perf_event *event, > return -EPROTO; > } > > - event->prog = prog; > + event->prog = (void *)((unsigned long)prog | 1); > event->bpf_cookie = bpf_cookie; > return 0; > } > > static inline void perf_event_free_bpf_handler(struct perf_event *event) > { > - struct bpf_prog *prog = event->prog; > + struct bpf_prog *prog = event_prog(event); > > if (!prog) > return; > @@ -9707,7 +9716,7 @@ static int __perf_event_overflow(struct perf_event *event, > > ret = __perf_event_account_interrupt(event, throttle); > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > + if (!bpf_overflow_handler(event, data, regs)) > return ret; > > /* > @@ -12026,10 +12035,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, > context = parent_event->overflow_handler_context; > #if defined(CONFIG_BPF_SYSCALL) && defined(CONFIG_EVENT_TRACING) > if (parent_event->prog) { > - struct bpf_prog *prog = parent_event->prog; > + struct bpf_prog *prog = event_prog(parent_event); > > bpf_prog_inc(prog); > - event->prog = prog; > + event->prog = parent_event->prog; > } > #endif > }
On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > I think this would probably work but stealing the bit seems far more > complicated than just gating on perf_event_is_tracing(). perf_event_is_tracing() is something like 3 branches. It is not a simple conditional. Combined with that re-load and the wrong return value, this all wants a cleanup. Using that LSB works, it's just that the code aint pretty.
On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > I think this would probably work but stealing the bit seems far more > > complicated than just gating on perf_event_is_tracing(). > > perf_event_is_tracing() is something like 3 branches. It is not a simple > conditional. Combined with that re-load and the wrong return value, this > all wants a cleanup. > > Using that LSB works, it's just that the code aint pretty. Maybe we could gate on !event->tp_event instead. Somebody who is more familiar with this code than me should probably confirm that tp_event being non-null and perf_event_is_tracing() being true are equivalent though. - Kyle
On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > I think this would probably work but stealing the bit seems far more > > > complicated than just gating on perf_event_is_tracing(). > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > conditional. Combined with that re-load and the wrong return value, this > > all wants a cleanup. > > > > Using that LSB works, it's just that the code aint pretty. > > Maybe we could gate on !event->tp_event instead. Somebody who is more > familiar with this code than me should probably confirm that tp_event > being non-null and perf_event_is_tracing() being true are equivalent > though. > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events are the only ones having the tp_event pointer set, Masami? fwiw I tried to run bpf selftests with that and it's fine jirka
On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > I think this would probably work but stealing the bit seems far more > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > conditional. Combined with that re-load and the wrong return value, this > > > all wants a cleanup. > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > familiar with this code than me should probably confirm that tp_event > > being non-null and perf_event_is_tracing() being true are equivalent > > though. > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > are the only ones having the tp_event pointer set, Masami? > > fwiw I tried to run bpf selftests with that and it's fine Why can't we do the most straightforward thing in this case? diff --git a/kernel/events/core.c b/kernel/events/core.c index ab6c4c942f79..cf4645b26c90 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, ret = __perf_event_account_interrupt(event, throttle); - if (event->prog && !bpf_overflow_handler(event, data, regs)) + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && + !bpf_overflow_handler(event, data, regs)) return ret; > > jirka >
On Tue, 16 Jul 2024 09:25:21 +0200 Jiri Olsa <olsajiri@gmail.com> wrote: > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > I think this would probably work but stealing the bit seems far more > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > conditional. Combined with that re-load and the wrong return value, this > > > all wants a cleanup. > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > familiar with this code than me should probably confirm that tp_event > > being non-null and perf_event_is_tracing() being true are equivalent > > though. > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > are the only ones having the tp_event pointer set, Masami? Hmm, I think any dynamic_events has tp_event (is struct trace_event_call *) because it represents the event itself. But yes, if the event is working like a trace-event, it should have tp_event. So you can use it instead perf_event_is_tracing(). Thank you, > > fwiw I tried to run bpf selftests with that and it's fine > > jirka >
On Fri, Jul 19, 2024 at 11:26 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > > > I think this would probably work but stealing the bit seems far more > > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > > conditional. Combined with that re-load and the wrong return value, this > > > > all wants a cleanup. > > > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > > familiar with this code than me should probably confirm that tp_event > > > being non-null and perf_event_is_tracing() being true are equivalent > > > though. > > > > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > > are the only ones having the tp_event pointer set, Masami? > > > > fwiw I tried to run bpf selftests with that and it's fine > > Why can't we do the most straightforward thing in this case? > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index ab6c4c942f79..cf4645b26c90 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, > > ret = __perf_event_account_interrupt(event, throttle); > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > + !bpf_overflow_handler(event, data, regs)) > return ret; > > > > > > jirka > > Yes, that's effectively equivalent to calling perf_event_is_tracing() and would work too. Do you want to land that patch? It needs to go to 6.10 stable too. - Kyle
On Fri, Jul 26, 2024 at 5:37 AM Kyle Huey <me@kylehuey.com> wrote: > > On Fri, Jul 19, 2024 at 11:26 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > > > > > I think this would probably work but stealing the bit seems far more > > > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > > > conditional. Combined with that re-load and the wrong return value, this > > > > > all wants a cleanup. > > > > > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > > > familiar with this code than me should probably confirm that tp_event > > > > being non-null and perf_event_is_tracing() being true are equivalent > > > > though. > > > > > > > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > > > are the only ones having the tp_event pointer set, Masami? > > > > > > fwiw I tried to run bpf selftests with that and it's fine > > > > Why can't we do the most straightforward thing in this case? > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index ab6c4c942f79..cf4645b26c90 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > + !bpf_overflow_handler(event, data, regs)) > > return ret; > > > > > > > > > > jirka > > > > > Yes, that's effectively equivalent to calling perf_event_is_tracing() > and would work too. Do you want to land that patch? It needs to go to > 6.10 stable too. I'd appreciate it if you can just incorporate that into your patch and resend it, thank you! > > - Kyle
Will do. - Kyle On Fri, Jul 26, 2024 at 9:34 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Jul 26, 2024 at 5:37 AM Kyle Huey <me@kylehuey.com> wrote: > > > > On Fri, Jul 19, 2024 at 11:26 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > > > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > > > > > > > I think this would probably work but stealing the bit seems far more > > > > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > > > > conditional. Combined with that re-load and the wrong return value, this > > > > > > all wants a cleanup. > > > > > > > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > > > > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > > > > familiar with this code than me should probably confirm that tp_event > > > > > being non-null and perf_event_is_tracing() being true are equivalent > > > > > though. > > > > > > > > > > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > > > > are the only ones having the tp_event pointer set, Masami? > > > > > > > > fwiw I tried to run bpf selftests with that and it's fine > > > > > > Why can't we do the most straightforward thing in this case? > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > index ab6c4c942f79..cf4645b26c90 100644 > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > > + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > + !bpf_overflow_handler(event, data, regs)) > > > return ret; > > > > > > > > > > > > > > jirka > > > > > > > > Yes, that's effectively equivalent to calling perf_event_is_tracing() > > and would work too. Do you want to land that patch? It needs to go to > > 6.10 stable too. > > I'd appreciate it if you can just incorporate that into your patch and > resend it, thank you! > > > > > - Kyle
On Fri, Jul 26, 2024 at 09:35:43AM -0700, Kyle Huey wrote: > Will do. > > - Kyle > > On Fri, Jul 26, 2024 at 9:34 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: > > > > On Fri, Jul 26, 2024 at 5:37 AM Kyle Huey <me@kylehuey.com> wrote: > > > > > > On Fri, Jul 19, 2024 at 11:26 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > > > > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > > > > > > > > > I think this would probably work but stealing the bit seems far more > > > > > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > > > > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > > > > > conditional. Combined with that re-load and the wrong return value, this > > > > > > > all wants a cleanup. > > > > > > > > > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > > > > > > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > > > > > familiar with this code than me should probably confirm that tp_event > > > > > > being non-null and perf_event_is_tracing() being true are equivalent > > > > > > though. > > > > > > > > > > > > > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > > > > > are the only ones having the tp_event pointer set, Masami? > > > > > > > > > > fwiw I tried to run bpf selftests with that and it's fine > > > > > > > > Why can't we do the most straightforward thing in this case? > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > index ab6c4c942f79..cf4645b26c90 100644 > > > > --- a/kernel/events/core.c > > > > +++ b/kernel/events/core.c > > > > @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > > > + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > > + !bpf_overflow_handler(event, data, regs)) > > > > return ret; > > > > > > > > > > > > > > > > > > jirka > > > > > > > > > > > Yes, that's effectively equivalent to calling perf_event_is_tracing() > > > and would work too. Do you want to land that patch? It needs to go to > > > 6.10 stable too. > > > > I'd appreciate it if you can just incorporate that into your patch and > > resend it, thank you! > > > > > > > > - Kyle I probably missed the updated patch, but I am happy to test any new versions, if needed, to ensure that the bug I hit is fixed. Kyle: please let me know if there's a patch you'd like me to test? - Joe
On Mon, Aug 05, 2024 at 12:55:17PM +0100, Joe Damato wrote: > On Fri, Jul 26, 2024 at 09:35:43AM -0700, Kyle Huey wrote: > > Will do. > > > > - Kyle > > > > On Fri, Jul 26, 2024 at 9:34 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > > > > > > On Fri, Jul 26, 2024 at 5:37 AM Kyle Huey <me@kylehuey.com> wrote: > > > > > > > > On Fri, Jul 19, 2024 at 11:26 AM Andrii Nakryiko > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > > > > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > > > > > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > > > > > > > > > > > I think this would probably work but stealing the bit seems far more > > > > > > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > > > > > > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > > > > > > conditional. Combined with that re-load and the wrong return value, this > > > > > > > > all wants a cleanup. > > > > > > > > > > > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > > > > > > > > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > > > > > > familiar with this code than me should probably confirm that tp_event > > > > > > > being non-null and perf_event_is_tracing() being true are equivalent > > > > > > > though. > > > > > > > > > > > > > > > > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > > > > > > are the only ones having the tp_event pointer set, Masami? > > > > > > > > > > > > fwiw I tried to run bpf selftests with that and it's fine > > > > > > > > > > Why can't we do the most straightforward thing in this case? > > > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > > index ab6c4c942f79..cf4645b26c90 100644 > > > > > --- a/kernel/events/core.c > > > > > +++ b/kernel/events/core.c > > > > > @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > > > > + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > > > + !bpf_overflow_handler(event, data, regs)) > > > > > return ret; > > > > > > > > > > > > > > > > > > > > > > jirka > > > > > > > > > > > > > > Yes, that's effectively equivalent to calling perf_event_is_tracing() > > > > and would work too. Do you want to land that patch? It needs to go to > > > > 6.10 stable too. > > > > > > I'd appreciate it if you can just incorporate that into your patch and > > > resend it, thank you! > > > > > > > > > > > - Kyle > > I probably missed the updated patch, but I am happy to test any new > versions, if needed, to ensure that the bug I hit is fixed. > > Kyle: please let me know if there's a patch you'd like me to test? Sorry for pinging this thread again; let me know if a fix was merged and I missed it? Otherwise, if it'd be helpful, I am happy to modify Kyle's patch to take Andrii's suggestion and resend, but I don't want to step on anyone's toes :) - Joe
On Tue, Aug 13, 2024 at 3:37 AM Joe Damato <jdamato@fastly.com> wrote: > > On Mon, Aug 05, 2024 at 12:55:17PM +0100, Joe Damato wrote: > > On Fri, Jul 26, 2024 at 09:35:43AM -0700, Kyle Huey wrote: > > > Will do. > > > > > > - Kyle > > > > > > On Fri, Jul 26, 2024 at 9:34 AM Andrii Nakryiko > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > On Fri, Jul 26, 2024 at 5:37 AM Kyle Huey <me@kylehuey.com> wrote: > > > > > > > > > > On Fri, Jul 19, 2024 at 11:26 AM Andrii Nakryiko > > > > > <andrii.nakryiko@gmail.com> wrote: > > > > > > > > > > > > On Tue, Jul 16, 2024 at 12:25 AM Jiri Olsa <olsajiri@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, Jul 15, 2024 at 09:48:58AM -0700, Kyle Huey wrote: > > > > > > > > On Mon, Jul 15, 2024 at 9:30 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > > > > > > > > > > > > > > > On Mon, Jul 15, 2024 at 08:19:44AM -0700, Kyle Huey wrote: > > > > > > > > > > > > > > > > > > > I think this would probably work but stealing the bit seems far more > > > > > > > > > > complicated than just gating on perf_event_is_tracing(). > > > > > > > > > > > > > > > > > > perf_event_is_tracing() is something like 3 branches. It is not a simple > > > > > > > > > conditional. Combined with that re-load and the wrong return value, this > > > > > > > > > all wants a cleanup. > > > > > > > > > > > > > > > > > > Using that LSB works, it's just that the code aint pretty. > > > > > > > > > > > > > > > > Maybe we could gate on !event->tp_event instead. Somebody who is more > > > > > > > > familiar with this code than me should probably confirm that tp_event > > > > > > > > being non-null and perf_event_is_tracing() being true are equivalent > > > > > > > > though. > > > > > > > > > > > > > > > > > > > > > > it looks like that's the case, AFAICS tracepoint/kprobe/uprobe events > > > > > > > are the only ones having the tp_event pointer set, Masami? > > > > > > > > > > > > > > fwiw I tried to run bpf selftests with that and it's fine > > > > > > > > > > > > Why can't we do the most straightforward thing in this case? > > > > > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > > > index ab6c4c942f79..cf4645b26c90 100644 > > > > > > --- a/kernel/events/core.c > > > > > > +++ b/kernel/events/core.c > > > > > > @@ -9707,7 +9707,8 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > > > > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > > > > > > > - if (event->prog && !bpf_overflow_handler(event, data, regs)) > > > > > > + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > > > > + !bpf_overflow_handler(event, data, regs)) > > > > > > return ret; > > > > > > > > > > > > > > > > > > > > > > > > > > jirka > > > > > > > > > > > > > > > > > Yes, that's effectively equivalent to calling perf_event_is_tracing() > > > > > and would work too. Do you want to land that patch? It needs to go to > > > > > 6.10 stable too. > > > > > > > > I'd appreciate it if you can just incorporate that into your patch and > > > > resend it, thank you! > > > > > > > > > > > > > > - Kyle > > > > I probably missed the updated patch, but I am happy to test any new > > versions, if needed, to ensure that the bug I hit is fixed. > > > > Kyle: please let me know if there's a patch you'd like me to test? > > Sorry for pinging this thread again; let me know if a fix was merged > and I missed it? > > Otherwise, if it'd be helpful, I am happy to modify Kyle's patch to > take Andrii's suggestion and resend, but I don't want to step on > anyone's toes :) > > - Joe Hi Joe, You didn't miss anything. I've been remiss in getting back to this. I'm currently away from home (and the machine I usually do kernel development on), so if you want to run with this please do so. Sorry for the inconvenience, - Kyle
diff --git a/kernel/events/core.c b/kernel/events/core.c index 8f908f077935..f0d7119585dc 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9666,6 +9666,8 @@ static inline void perf_event_free_bpf_handler(struct perf_event *event) * Generic event overflow handling, sampling. */ +static bool perf_event_is_tracing(struct perf_event *event); + static int __perf_event_overflow(struct perf_event *event, int throttle, struct perf_sample_data *data, struct pt_regs *regs) @@ -9682,7 +9684,9 @@ static int __perf_event_overflow(struct perf_event *event, ret = __perf_event_account_interrupt(event, throttle); - if (event->prog && !bpf_overflow_handler(event, data, regs)) + if (event->prog && + !perf_event_is_tracing(event) && + !bpf_overflow_handler(event, data, regs)) return ret; /* @@ -10612,6 +10616,11 @@ void perf_event_free_bpf_prog(struct perf_event *event) #else +static inline bool perf_event_is_tracing(struct perf_event *event) +{ + return false; +} + static inline void perf_tp_register(void) { }