Message ID | 20241023000928.957077-4-namhyung@kernel.org (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | None | expand |
On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Like in the software events, the BPF overflow handler can drop samples > by returning 0. Let's count the dropped samples here too. > > Acked-by: Kyle Huey <me@kylehuey.com> > Cc: Alexei Starovoitov <ast@kernel.org> > Cc: Andrii Nakryiko <andrii@kernel.org> > Cc: Song Liu <song@kernel.org> > Cc: bpf@vger.kernel.org > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > --- > kernel/events/core.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/events/core.c b/kernel/events/core.c > index 5d24597180dec167..b41c17a0bc19f7c2 100644 > --- a/kernel/events/core.c > +++ b/kernel/events/core.c > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, > ret = __perf_event_account_interrupt(event, throttle); > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > - !bpf_overflow_handler(event, data, regs)) > + !bpf_overflow_handler(event, data, regs)) { > + atomic64_inc(&event->dropped_samples); I don't see the full patch set (please cc relevant people and mailing list on each patch in the patch set), but do we really want to pay the price of atomic increment on what's the very typical situation of a BPF program returning 0? At least from a BPF perspective this is no "dropping sample", it's just processing it in BPF and not paying the overhead of the perf subsystem continuing processing it afterwards. So the dropping part is also misleading, IMO. > return ret; > + } > > /* > * XXX event_limit might not quite work as expected on inherited > -- > 2.47.0.105.g07ac214952-goog >
Hello, On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote: > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > Like in the software events, the BPF overflow handler can drop samples > > by returning 0. Let's count the dropped samples here too. > > > > Acked-by: Kyle Huey <me@kylehuey.com> > > Cc: Alexei Starovoitov <ast@kernel.org> > > Cc: Andrii Nakryiko <andrii@kernel.org> > > Cc: Song Liu <song@kernel.org> > > Cc: bpf@vger.kernel.org > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > --- > > kernel/events/core.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > index 5d24597180dec167..b41c17a0bc19f7c2 100644 > > --- a/kernel/events/core.c > > +++ b/kernel/events/core.c > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, > > ret = __perf_event_account_interrupt(event, throttle); > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > - !bpf_overflow_handler(event, data, regs)) > > + !bpf_overflow_handler(event, data, regs)) { > > + atomic64_inc(&event->dropped_samples); > > I don't see the full patch set (please cc relevant people and mailing > list on each patch in the patch set), but do we really want to pay the Sorry, you can find the whole series here. https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org I thought it's mostly for the perf part so I didn't CC bpf folks but I'll do in the next version. > price of atomic increment on what's the very typical situation of a > BPF program returning 0? Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs usually return 0 but PERF_EVENT should care about the return values. > > At least from a BPF perspective this is no "dropping sample", it's > just processing it in BPF and not paying the overhead of the perf > subsystem continuing processing it afterwards. So the dropping part is > also misleading, IMO. In the perf tools, we have a filtering logic in BPF to read sample values and to decide whether we want this sample or not. In that case users would be interested in the exact number of samples. Thanks, Namhyung > > > return ret; > > + } > > > > /* > > * XXX event_limit might not quite work as expected on inherited > > -- > > 2.47.0.105.g07ac214952-goog > >
On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote: > > Hello, > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote: > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > Like in the software events, the BPF overflow handler can drop samples > > > by returning 0. Let's count the dropped samples here too. > > > > > > Acked-by: Kyle Huey <me@kylehuey.com> > > > Cc: Alexei Starovoitov <ast@kernel.org> > > > Cc: Andrii Nakryiko <andrii@kernel.org> > > > Cc: Song Liu <song@kernel.org> > > > Cc: bpf@vger.kernel.org > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > > --- > > > kernel/events/core.c | 4 +++- > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644 > > > --- a/kernel/events/core.c > > > +++ b/kernel/events/core.c > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > - !bpf_overflow_handler(event, data, regs)) > > > + !bpf_overflow_handler(event, data, regs)) { > > > + atomic64_inc(&event->dropped_samples); > > > > I don't see the full patch set (please cc relevant people and mailing > > list on each patch in the patch set), but do we really want to pay the > > Sorry, you can find the whole series here. > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org > > I thought it's mostly for the perf part so I didn't CC bpf folks but > I'll do in the next version. > > > > price of atomic increment on what's the very typical situation of a > > BPF program returning 0? > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs > usually return 0 but PERF_EVENT should care about the return values. > Yeah, it's pretty much always `return 0;` for perf_event-based BPF profilers. It's rather unusual to return non-zero, actually. > > > > At least from a BPF perspective this is no "dropping sample", it's > > just processing it in BPF and not paying the overhead of the perf > > subsystem continuing processing it afterwards. So the dropping part is > > also misleading, IMO. > > In the perf tools, we have a filtering logic in BPF to read sample > values and to decide whether we want this sample or not. In that case > users would be interested in the exact number of samples. > > Thanks, > Namhyung > > > > > > return ret; > > > + } > > > > > > /* > > > * XXX event_limit might not quite work as expected on inherited > > > -- > > > 2.47.0.105.g07ac214952-goog > > >
On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote: > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > Hello, > > > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote: > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > Like in the software events, the BPF overflow handler can drop samples > > > > by returning 0. Let's count the dropped samples here too. > > > > > > > > Acked-by: Kyle Huey <me@kylehuey.com> > > > > Cc: Alexei Starovoitov <ast@kernel.org> > > > > Cc: Andrii Nakryiko <andrii@kernel.org> > > > > Cc: Song Liu <song@kernel.org> > > > > Cc: bpf@vger.kernel.org > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > > > --- > > > > kernel/events/core.c | 4 +++- > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644 > > > > --- a/kernel/events/core.c > > > > +++ b/kernel/events/core.c > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > > - !bpf_overflow_handler(event, data, regs)) > > > > + !bpf_overflow_handler(event, data, regs)) { > > > > + atomic64_inc(&event->dropped_samples); > > > > > > I don't see the full patch set (please cc relevant people and mailing > > > list on each patch in the patch set), but do we really want to pay the > > > > Sorry, you can find the whole series here. > > > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org > > > > I thought it's mostly for the perf part so I didn't CC bpf folks but > > I'll do in the next version. > > > > > > > price of atomic increment on what's the very typical situation of a > > > BPF program returning 0? > > > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs > > usually return 0 but PERF_EVENT should care about the return values. > > > > Yeah, it's pretty much always `return 0;` for perf_event-based BPF > profilers. It's rather unusual to return non-zero, actually. Ok, then it may be local_t or plain unsigned long. It should be updated on overflow only. Read can be racy but I think it's ok to miss some numbers. If someone really needs a precise count, they can read it after disabling the event IMHO. What do you think? Thanks, Namhyung
On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@kernel.org> wrote: > > On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote: > > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > Hello, > > > > > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote: > > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > > > Like in the software events, the BPF overflow handler can drop samples > > > > > by returning 0. Let's count the dropped samples here too. > > > > > > > > > > Acked-by: Kyle Huey <me@kylehuey.com> > > > > > Cc: Alexei Starovoitov <ast@kernel.org> > > > > > Cc: Andrii Nakryiko <andrii@kernel.org> > > > > > Cc: Song Liu <song@kernel.org> > > > > > Cc: bpf@vger.kernel.org > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > > > > --- > > > > > kernel/events/core.c | 4 +++- > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644 > > > > > --- a/kernel/events/core.c > > > > > +++ b/kernel/events/core.c > > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > > > - !bpf_overflow_handler(event, data, regs)) > > > > > + !bpf_overflow_handler(event, data, regs)) { > > > > > + atomic64_inc(&event->dropped_samples); > > > > > > > > I don't see the full patch set (please cc relevant people and mailing > > > > list on each patch in the patch set), but do we really want to pay the > > > > > > Sorry, you can find the whole series here. > > > > > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org > > > > > > I thought it's mostly for the perf part so I didn't CC bpf folks but > > > I'll do in the next version. > > > > > > > > > > price of atomic increment on what's the very typical situation of a > > > > BPF program returning 0? > > > > > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs > > > usually return 0 but PERF_EVENT should care about the return values. > > > > > > > Yeah, it's pretty much always `return 0;` for perf_event-based BPF > > profilers. It's rather unusual to return non-zero, actually. > > Ok, then it may be local_t or plain unsigned long. It should be > updated on overflow only. Read can be racy but I think it's ok to > miss some numbers. If someone really needs a precise count, they can > read it after disabling the event IMHO. > > What do you think? > See [0] for unsynchronized increment absolutely killing the performance due to cache line bouncing between CPUs. If the event is high-frequency and can be triggered across multiple CPUs in short succession, even an imprecise counter might be harmful. In general, I'd say that if BPF attachment is involved, we probably should avoid maintaining unnecessary statistics. Things like this event->dropped_samples increment can be done very efficiently and trivially from inside the BPF program, if at all necessary. Having said that, if it's unlikely to have perf_event bouncing between multiple CPUs, it's probably not that big of a deal. [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/ > Thanks, > Namhyung >
On Wed, Oct 23, 2024 at 02:24:13PM -0700, Andrii Nakryiko wrote: > On Wed, Oct 23, 2024 at 1:32 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > On Wed, Oct 23, 2024 at 12:13:31PM -0700, Andrii Nakryiko wrote: > > > On Wed, Oct 23, 2024 at 11:47 AM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > Hello, > > > > > > > > On Wed, Oct 23, 2024 at 09:12:52AM -0700, Andrii Nakryiko wrote: > > > > > On Tue, Oct 22, 2024 at 5:09 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > > > > > > > Like in the software events, the BPF overflow handler can drop samples > > > > > > by returning 0. Let's count the dropped samples here too. > > > > > > > > > > > > Acked-by: Kyle Huey <me@kylehuey.com> > > > > > > Cc: Alexei Starovoitov <ast@kernel.org> > > > > > > Cc: Andrii Nakryiko <andrii@kernel.org> > > > > > > Cc: Song Liu <song@kernel.org> > > > > > > Cc: bpf@vger.kernel.org > > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org> > > > > > > --- > > > > > > kernel/events/core.c | 4 +++- > > > > > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/kernel/events/core.c b/kernel/events/core.c > > > > > > index 5d24597180dec167..b41c17a0bc19f7c2 100644 > > > > > > --- a/kernel/events/core.c > > > > > > +++ b/kernel/events/core.c > > > > > > @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, > > > > > > ret = __perf_event_account_interrupt(event, throttle); > > > > > > > > > > > > if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && > > > > > > - !bpf_overflow_handler(event, data, regs)) > > > > > > + !bpf_overflow_handler(event, data, regs)) { > > > > > > + atomic64_inc(&event->dropped_samples); > > > > > > > > > > I don't see the full patch set (please cc relevant people and mailing > > > > > list on each patch in the patch set), but do we really want to pay the > > > > > > > > Sorry, you can find the whole series here. > > > > > > > > https://lore.kernel.org/lkml/20241023000928.957077-1-namhyung@kernel.org > > > > > > > > I thought it's mostly for the perf part so I didn't CC bpf folks but > > > > I'll do in the next version. > > > > > > > > > > > > > price of atomic increment on what's the very typical situation of a > > > > > BPF program returning 0? > > > > > > > > Is it typical for BPF_PROG_TYPE_PERF_EVENT? I guess TRACING programs > > > > usually return 0 but PERF_EVENT should care about the return values. > > > > > > > > > > Yeah, it's pretty much always `return 0;` for perf_event-based BPF > > > profilers. It's rather unusual to return non-zero, actually. > > > > Ok, then it may be local_t or plain unsigned long. It should be > > updated on overflow only. Read can be racy but I think it's ok to > > miss some numbers. If someone really needs a precise count, they can > > read it after disabling the event IMHO. > > > > What do you think? > > > > See [0] for unsynchronized increment absolutely killing the > performance due to cache line bouncing between CPUs. If the event is > high-frequency and can be triggered across multiple CPUs in short > succession, even an imprecise counter might be harmful. Ok. > > In general, I'd say that if BPF attachment is involved, we probably > should avoid maintaining unnecessary statistics. Things like this > event->dropped_samples increment can be done very efficiently and > trivially from inside the BPF program, if at all necessary. Right, we can do that in the BPF too. > > Having said that, if it's unlikely to have perf_event bouncing between > multiple CPUs, it's probably not that big of a deal. Yeah, perf_event is dedicated to a CPU or a task and the counter is updated only in the overflow handler. So I don't think it'd cause cache line bouncing between CPUs. Thanks, Namhyung > > > [0] https://lore.kernel.org/linux-trace-kernel/20240813203409.3985398-1-andrii@kernel.org/ > > > Thanks, > > Namhyung > >
diff --git a/kernel/events/core.c b/kernel/events/core.c index 5d24597180dec167..b41c17a0bc19f7c2 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9831,8 +9831,10 @@ static int __perf_event_overflow(struct perf_event *event, ret = __perf_event_account_interrupt(event, throttle); if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && - !bpf_overflow_handler(event, data, regs)) + !bpf_overflow_handler(event, data, regs)) { + atomic64_inc(&event->dropped_samples); return ret; + } /* * XXX event_limit might not quite work as expected on inherited