mbox series

[v1,0/3] Use BPF filters for a "perf top -u" workaround

Message ID 20240516041948.3546553-1-irogers@google.com (mailing list archive)
Headers show
Series Use BPF filters for a "perf top -u" workaround | expand

Message

Ian Rogers May 16, 2024, 4:19 a.m. UTC
Allow uid and gid to be terms in BPF filters by first breaking the
connection between filter terms and PERF_SAMPLE_xx values. Calculate
the uid and gid using the bpf_get_current_uid_gid helper, rather than
from a value in the sample. Allow filters to be passed to perf top, this allows:

$ perf top -e cycles:P --filter "uid == $(id -u)"

to work as a "perf top -u" workaround, as "perf top -u" usually fails
due to processes/threads terminating between the /proc scan and the
perf_event_open.

Ian Rogers (3):
  perf bpf filter: Give terms their own enum
  perf bpf filter: Add uid and gid terms
  perf top: Allow filters on events

 tools/perf/Documentation/perf-record.txt     |  2 +-
 tools/perf/Documentation/perf-top.txt        |  4 ++
 tools/perf/builtin-top.c                     |  9 +++
 tools/perf/util/bpf-filter.c                 | 55 ++++++++++++----
 tools/perf/util/bpf-filter.h                 |  5 +-
 tools/perf/util/bpf-filter.l                 | 66 +++++++++----------
 tools/perf/util/bpf-filter.y                 |  7 +-
 tools/perf/util/bpf_skel/sample-filter.h     | 27 +++++++-
 tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
 9 files changed, 172 insertions(+), 70 deletions(-)

Comments

Ian Rogers May 16, 2024, 5:04 a.m. UTC | #1
On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote:
>
> Allow uid and gid to be terms in BPF filters by first breaking the
> connection between filter terms and PERF_SAMPLE_xx values. Calculate
> the uid and gid using the bpf_get_current_uid_gid helper, rather than
> from a value in the sample. Allow filters to be passed to perf top, this allows:
>
> $ perf top -e cycles:P --filter "uid == $(id -u)"
>
> to work as a "perf top -u" workaround, as "perf top -u" usually fails
> due to processes/threads terminating between the /proc scan and the
> perf_event_open.

Fwiw, something I noticed playing around with this (my workload was
`perf test -w noploop 100000` as different users) is that old samples
appeared to linger around making terminated processes still appear in
the top list. My guess is that there aren't other samples showing up
and pushing the old sample events out of the ring buffers due to the
filter. This can look quite odd and I don't know if we have a way to
improve upon it, flush the ring buffers, histograms, etc. It appears
to be a latent `perf top` issue that you could encounter on other low
frequency events, but I thought I'd mention it anyway.

Thanks,
Ian

> Ian Rogers (3):
>   perf bpf filter: Give terms their own enum
>   perf bpf filter: Add uid and gid terms
>   perf top: Allow filters on events
>
>  tools/perf/Documentation/perf-record.txt     |  2 +-
>  tools/perf/Documentation/perf-top.txt        |  4 ++
>  tools/perf/builtin-top.c                     |  9 +++
>  tools/perf/util/bpf-filter.c                 | 55 ++++++++++++----
>  tools/perf/util/bpf-filter.h                 |  5 +-
>  tools/perf/util/bpf-filter.l                 | 66 +++++++++----------
>  tools/perf/util/bpf-filter.y                 |  7 +-
>  tools/perf/util/bpf_skel/sample-filter.h     | 27 +++++++-
>  tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
>  9 files changed, 172 insertions(+), 70 deletions(-)
>
> --
> 2.45.0.rc1.225.g2a3ae87e7f-goog
>
Ian Rogers May 16, 2024, 5:34 p.m. UTC | #2
On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Allow uid and gid to be terms in BPF filters by first breaking the
> > connection between filter terms and PERF_SAMPLE_xx values. Calculate
> > the uid and gid using the bpf_get_current_uid_gid helper, rather than
> > from a value in the sample. Allow filters to be passed to perf top, this allows:
> >
> > $ perf top -e cycles:P --filter "uid == $(id -u)"
> >
> > to work as a "perf top -u" workaround, as "perf top -u" usually fails
> > due to processes/threads terminating between the /proc scan and the
> > perf_event_open.
>
> Fwiw, something I noticed playing around with this (my workload was
> `perf test -w noploop 100000` as different users) is that old samples
> appeared to linger around making terminated processes still appear in
> the top list. My guess is that there aren't other samples showing up
> and pushing the old sample events out of the ring buffers due to the
> filter. This can look quite odd and I don't know if we have a way to
> improve upon it, flush the ring buffers, histograms, etc. It appears
> to be a latent `perf top` issue that you could encounter on other low
> frequency events, but I thought I'd mention it anyway.

Some other thoughts:

 - It is kind of annoying with the --filter option (either on top or
record) that there first needs to be an event to filter on. It'd be
nice if we could just filter the default event.

 - Should "perf top --uid=1234" be removed or turned into  an alias
for '--filter "uid == $(id -u)"' given the --uid option generally
doesn't work?

 - What should happen to the perf top --pid and --tid options, should
they be filters? Should they fallback on /proc scanning if there
aren't sufficient BPF permissions? The plumbing for that is going to
be messy.

 - There should probably be a way to filter on cgroups.

 - Does the user care that there are 3 kinds of filter that will work
differently? Could we break them apart to make it more explicit, I may
want tracepoint events with a BPF filter. How can we ensure 1 syntax
for the 3 kinds of filter.

 - Filtering on register values could be potentially interesting, for
example, sampling on memcpy-s where the length is over a threshold. We
have a register capture test:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh#n81
Perhaps the filter could look something like 'perf record -g -e
mem:$ADDRESS_OF_MEMCPY:x --filter "reg:rdx > 1024"' -  this makes me
think we need to make a more convenient way to specify memory
addresses as symbols.

Thanks,
Ian

>
> > Ian Rogers (3):
> >   perf bpf filter: Give terms their own enum
> >   perf bpf filter: Add uid and gid terms
> >   perf top: Allow filters on events
> >
> >  tools/perf/Documentation/perf-record.txt     |  2 +-
> >  tools/perf/Documentation/perf-top.txt        |  4 ++
> >  tools/perf/builtin-top.c                     |  9 +++
> >  tools/perf/util/bpf-filter.c                 | 55 ++++++++++++----
> >  tools/perf/util/bpf-filter.h                 |  5 +-
> >  tools/perf/util/bpf-filter.l                 | 66 +++++++++----------
> >  tools/perf/util/bpf-filter.y                 |  7 +-
> >  tools/perf/util/bpf_skel/sample-filter.h     | 27 +++++++-
> >  tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
> >  9 files changed, 172 insertions(+), 70 deletions(-)
> >
> > --
> > 2.45.0.rc1.225.g2a3ae87e7f-goog
> >
Ian Rogers May 16, 2024, 9:47 p.m. UTC | #3
On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote:
> >
> > Allow uid and gid to be terms in BPF filters by first breaking the
> > connection between filter terms and PERF_SAMPLE_xx values. Calculate
> > the uid and gid using the bpf_get_current_uid_gid helper, rather than
> > from a value in the sample. Allow filters to be passed to perf top, this allows:
> >
> > $ perf top -e cycles:P --filter "uid == $(id -u)"
> >
> > to work as a "perf top -u" workaround, as "perf top -u" usually fails
> > due to processes/threads terminating between the /proc scan and the
> > perf_event_open.
>
> Fwiw, something I noticed playing around with this (my workload was
> `perf test -w noploop 100000` as different users) is that old samples
> appeared to linger around making terminated processes still appear in
> the top list. My guess is that there aren't other samples showing up
> and pushing the old sample events out of the ring buffers due to the
> filter. This can look quite odd and I don't know if we have a way to
> improve upon it, flush the ring buffers, histograms, etc. It appears
> to be a latent `perf top` issue that you could encounter on other low
> frequency events, but I thought I'd mention it anyway.

Oh, this is expected "perf top" behavior and "-z" fixes it:
```
$ man perf-top
...
       -z, --zero
          Zero history across display updates.
....
```
Why isn't "-z" the default? It would more naturally align with the
behavior of "top". I'll send a patch.

Thanks,
Ian

> Thanks,
> Ian
>
> > Ian Rogers (3):
> >   perf bpf filter: Give terms their own enum
> >   perf bpf filter: Add uid and gid terms
> >   perf top: Allow filters on events
> >
> >  tools/perf/Documentation/perf-record.txt     |  2 +-
> >  tools/perf/Documentation/perf-top.txt        |  4 ++
> >  tools/perf/builtin-top.c                     |  9 +++
> >  tools/perf/util/bpf-filter.c                 | 55 ++++++++++++----
> >  tools/perf/util/bpf-filter.h                 |  5 +-
> >  tools/perf/util/bpf-filter.l                 | 66 +++++++++----------
> >  tools/perf/util/bpf-filter.y                 |  7 +-
> >  tools/perf/util/bpf_skel/sample-filter.h     | 27 +++++++-
> >  tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
> >  9 files changed, 172 insertions(+), 70 deletions(-)
> >
> > --
> > 2.45.0.rc1.225.g2a3ae87e7f-goog
> >
Namhyung Kim May 18, 2024, 1:21 a.m. UTC | #4
Hi Ian,

On Thu, May 16, 2024 at 10:34 AM Ian Rogers <irogers@google.com> wrote:
>
> On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > Allow uid and gid to be terms in BPF filters by first breaking the
> > > connection between filter terms and PERF_SAMPLE_xx values. Calculate
> > > the uid and gid using the bpf_get_current_uid_gid helper, rather than
> > > from a value in the sample. Allow filters to be passed to perf top, this allows:
> > >
> > > $ perf top -e cycles:P --filter "uid == $(id -u)"
> > >
> > > to work as a "perf top -u" workaround, as "perf top -u" usually fails
> > > due to processes/threads terminating between the /proc scan and the
> > > perf_event_open.
> >
> > Fwiw, something I noticed playing around with this (my workload was
> > `perf test -w noploop 100000` as different users) is that old samples
> > appeared to linger around making terminated processes still appear in
> > the top list. My guess is that there aren't other samples showing up
> > and pushing the old sample events out of the ring buffers due to the
> > filter. This can look quite odd and I don't know if we have a way to
> > improve upon it, flush the ring buffers, histograms, etc. It appears
> > to be a latent `perf top` issue that you could encounter on other low
> > frequency events, but I thought I'd mention it anyway.
>
> Some other thoughts:
>
>  - It is kind of annoying with the --filter option (either on top or
> record) that there first needs to be an event to filter on. It'd be
> nice if we could just filter the default event.

Hmm.. right.  It should work with the default event when
no -e option is given.

>
>  - Should "perf top --uid=1234" be removed or turned into  an alias
> for '--filter "uid == $(id -u)"' given the --uid option generally
> doesn't work?

I think --uid should not fail if it cannot find the task.
I had a similar situation for perf stat --for-each-cgroup
and made it ignore the failures.

>
>  - What should happen to the perf top --pid and --tid options, should
> they be filters? Should they fallback on /proc scanning if there
> aren't sufficient BPF permissions? The plumbing for that is going to
> be messy.

I'm not inclined to do such things.

>
>  - There should probably be a way to filter on cgroups.

+1

>
>  - Does the user care that there are 3 kinds of filter that will work
> differently? Could we break them apart to make it more explicit, I may
> want tracepoint events with a BPF filter. How can we ensure 1 syntax
> for the 3 kinds of filter.
>
>  - Filtering on register values could be potentially interesting, for
> example, sampling on memcpy-s where the length is over a threshold. We
> have a register capture test:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh#n81
> Perhaps the filter could look something like 'perf record -g -e
> mem:$ADDRESS_OF_MEMCPY:x --filter "reg:rdx > 1024"' -  this makes me
> think we need to make a more convenient way to specify memory
> addresses as symbols.

I've been thinking about a similar idea on uftrace.
It would filter the function based on the value of an
argument or a global variable.

Thanks,
Namhyung


> >
> > > Ian Rogers (3):
> > >   perf bpf filter: Give terms their own enum
> > >   perf bpf filter: Add uid and gid terms
> > >   perf top: Allow filters on events
> > >
> > >  tools/perf/Documentation/perf-record.txt     |  2 +-
> > >  tools/perf/Documentation/perf-top.txt        |  4 ++
> > >  tools/perf/builtin-top.c                     |  9 +++
> > >  tools/perf/util/bpf-filter.c                 | 55 ++++++++++++----
> > >  tools/perf/util/bpf-filter.h                 |  5 +-
> > >  tools/perf/util/bpf-filter.l                 | 66 +++++++++----------
> > >  tools/perf/util/bpf-filter.y                 |  7 +-
> > >  tools/perf/util/bpf_skel/sample-filter.h     | 27 +++++++-
> > >  tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++-----
> > >  9 files changed, 172 insertions(+), 70 deletions(-)
> > >
> > > --
> > > 2.45.0.rc1.225.g2a3ae87e7f-goog
> > >