Message ID | 20240516041948.3546553-1-irogers@google.com (mailing list archive) |
---|---|
Headers | show |
Series | Use BPF filters for a "perf top -u" workaround | expand |
On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote: > > Allow uid and gid to be terms in BPF filters by first breaking the > connection between filter terms and PERF_SAMPLE_xx values. Calculate > the uid and gid using the bpf_get_current_uid_gid helper, rather than > from a value in the sample. Allow filters to be passed to perf top, this allows: > > $ perf top -e cycles:P --filter "uid == $(id -u)" > > to work as a "perf top -u" workaround, as "perf top -u" usually fails > due to processes/threads terminating between the /proc scan and the > perf_event_open. Fwiw, something I noticed playing around with this (my workload was `perf test -w noploop 100000` as different users) is that old samples appeared to linger around making terminated processes still appear in the top list. My guess is that there aren't other samples showing up and pushing the old sample events out of the ring buffers due to the filter. This can look quite odd and I don't know if we have a way to improve upon it, flush the ring buffers, histograms, etc. It appears to be a latent `perf top` issue that you could encounter on other low frequency events, but I thought I'd mention it anyway. Thanks, Ian > Ian Rogers (3): > perf bpf filter: Give terms their own enum > perf bpf filter: Add uid and gid terms > perf top: Allow filters on events > > tools/perf/Documentation/perf-record.txt | 2 +- > tools/perf/Documentation/perf-top.txt | 4 ++ > tools/perf/builtin-top.c | 9 +++ > tools/perf/util/bpf-filter.c | 55 ++++++++++++---- > tools/perf/util/bpf-filter.h | 5 +- > tools/perf/util/bpf-filter.l | 66 +++++++++---------- > tools/perf/util/bpf-filter.y | 7 +- > tools/perf/util/bpf_skel/sample-filter.h | 27 +++++++- > tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++----- > 9 files changed, 172 insertions(+), 70 deletions(-) > > -- > 2.45.0.rc1.225.g2a3ae87e7f-goog >
On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@google.com> wrote: > > On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote: > > > > Allow uid and gid to be terms in BPF filters by first breaking the > > connection between filter terms and PERF_SAMPLE_xx values. Calculate > > the uid and gid using the bpf_get_current_uid_gid helper, rather than > > from a value in the sample. Allow filters to be passed to perf top, this allows: > > > > $ perf top -e cycles:P --filter "uid == $(id -u)" > > > > to work as a "perf top -u" workaround, as "perf top -u" usually fails > > due to processes/threads terminating between the /proc scan and the > > perf_event_open. > > Fwiw, something I noticed playing around with this (my workload was > `perf test -w noploop 100000` as different users) is that old samples > appeared to linger around making terminated processes still appear in > the top list. My guess is that there aren't other samples showing up > and pushing the old sample events out of the ring buffers due to the > filter. This can look quite odd and I don't know if we have a way to > improve upon it, flush the ring buffers, histograms, etc. It appears > to be a latent `perf top` issue that you could encounter on other low > frequency events, but I thought I'd mention it anyway. Some other thoughts: - It is kind of annoying with the --filter option (either on top or record) that there first needs to be an event to filter on. It'd be nice if we could just filter the default event. - Should "perf top --uid=1234" be removed or turned into an alias for '--filter "uid == $(id -u)"' given the --uid option generally doesn't work? - What should happen to the perf top --pid and --tid options, should they be filters? Should they fallback on /proc scanning if there aren't sufficient BPF permissions? The plumbing for that is going to be messy. - There should probably be a way to filter on cgroups. - Does the user care that there are 3 kinds of filter that will work differently? Could we break them apart to make it more explicit, I may want tracepoint events with a BPF filter. How can we ensure 1 syntax for the 3 kinds of filter. - Filtering on register values could be potentially interesting, for example, sampling on memcpy-s where the length is over a threshold. We have a register capture test: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh#n81 Perhaps the filter could look something like 'perf record -g -e mem:$ADDRESS_OF_MEMCPY:x --filter "reg:rdx > 1024"' - this makes me think we need to make a more convenient way to specify memory addresses as symbols. Thanks, Ian > > > Ian Rogers (3): > > perf bpf filter: Give terms their own enum > > perf bpf filter: Add uid and gid terms > > perf top: Allow filters on events > > > > tools/perf/Documentation/perf-record.txt | 2 +- > > tools/perf/Documentation/perf-top.txt | 4 ++ > > tools/perf/builtin-top.c | 9 +++ > > tools/perf/util/bpf-filter.c | 55 ++++++++++++---- > > tools/perf/util/bpf-filter.h | 5 +- > > tools/perf/util/bpf-filter.l | 66 +++++++++---------- > > tools/perf/util/bpf-filter.y | 7 +- > > tools/perf/util/bpf_skel/sample-filter.h | 27 +++++++- > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++----- > > 9 files changed, 172 insertions(+), 70 deletions(-) > > > > -- > > 2.45.0.rc1.225.g2a3ae87e7f-goog > >
On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@google.com> wrote: > > On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote: > > > > Allow uid and gid to be terms in BPF filters by first breaking the > > connection between filter terms and PERF_SAMPLE_xx values. Calculate > > the uid and gid using the bpf_get_current_uid_gid helper, rather than > > from a value in the sample. Allow filters to be passed to perf top, this allows: > > > > $ perf top -e cycles:P --filter "uid == $(id -u)" > > > > to work as a "perf top -u" workaround, as "perf top -u" usually fails > > due to processes/threads terminating between the /proc scan and the > > perf_event_open. > > Fwiw, something I noticed playing around with this (my workload was > `perf test -w noploop 100000` as different users) is that old samples > appeared to linger around making terminated processes still appear in > the top list. My guess is that there aren't other samples showing up > and pushing the old sample events out of the ring buffers due to the > filter. This can look quite odd and I don't know if we have a way to > improve upon it, flush the ring buffers, histograms, etc. It appears > to be a latent `perf top` issue that you could encounter on other low > frequency events, but I thought I'd mention it anyway. Oh, this is expected "perf top" behavior and "-z" fixes it: ``` $ man perf-top ... -z, --zero Zero history across display updates. .... ``` Why isn't "-z" the default? It would more naturally align with the behavior of "top". I'll send a patch. Thanks, Ian > Thanks, > Ian > > > Ian Rogers (3): > > perf bpf filter: Give terms their own enum > > perf bpf filter: Add uid and gid terms > > perf top: Allow filters on events > > > > tools/perf/Documentation/perf-record.txt | 2 +- > > tools/perf/Documentation/perf-top.txt | 4 ++ > > tools/perf/builtin-top.c | 9 +++ > > tools/perf/util/bpf-filter.c | 55 ++++++++++++---- > > tools/perf/util/bpf-filter.h | 5 +- > > tools/perf/util/bpf-filter.l | 66 +++++++++---------- > > tools/perf/util/bpf-filter.y | 7 +- > > tools/perf/util/bpf_skel/sample-filter.h | 27 +++++++- > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++----- > > 9 files changed, 172 insertions(+), 70 deletions(-) > > > > -- > > 2.45.0.rc1.225.g2a3ae87e7f-goog > >
Hi Ian, On Thu, May 16, 2024 at 10:34 AM Ian Rogers <irogers@google.com> wrote: > > On Wed, May 15, 2024 at 10:04 PM Ian Rogers <irogers@google.com> wrote: > > > > On Wed, May 15, 2024 at 9:20 PM Ian Rogers <irogers@google.com> wrote: > > > > > > Allow uid and gid to be terms in BPF filters by first breaking the > > > connection between filter terms and PERF_SAMPLE_xx values. Calculate > > > the uid and gid using the bpf_get_current_uid_gid helper, rather than > > > from a value in the sample. Allow filters to be passed to perf top, this allows: > > > > > > $ perf top -e cycles:P --filter "uid == $(id -u)" > > > > > > to work as a "perf top -u" workaround, as "perf top -u" usually fails > > > due to processes/threads terminating between the /proc scan and the > > > perf_event_open. > > > > Fwiw, something I noticed playing around with this (my workload was > > `perf test -w noploop 100000` as different users) is that old samples > > appeared to linger around making terminated processes still appear in > > the top list. My guess is that there aren't other samples showing up > > and pushing the old sample events out of the ring buffers due to the > > filter. This can look quite odd and I don't know if we have a way to > > improve upon it, flush the ring buffers, histograms, etc. It appears > > to be a latent `perf top` issue that you could encounter on other low > > frequency events, but I thought I'd mention it anyway. > > Some other thoughts: > > - It is kind of annoying with the --filter option (either on top or > record) that there first needs to be an event to filter on. It'd be > nice if we could just filter the default event. Hmm.. right. It should work with the default event when no -e option is given. > > - Should "perf top --uid=1234" be removed or turned into an alias > for '--filter "uid == $(id -u)"' given the --uid option generally > doesn't work? I think --uid should not fail if it cannot find the task. I had a similar situation for perf stat --for-each-cgroup and made it ignore the failures. > > - What should happen to the perf top --pid and --tid options, should > they be filters? Should they fallback on /proc scanning if there > aren't sufficient BPF permissions? The plumbing for that is going to > be messy. I'm not inclined to do such things. > > - There should probably be a way to filter on cgroups. +1 > > - Does the user care that there are 3 kinds of filter that will work > differently? Could we break them apart to make it more explicit, I may > want tracepoint events with a BPF filter. How can we ensure 1 syntax > for the 3 kinds of filter. > > - Filtering on register values could be potentially interesting, for > example, sampling on memcpy-s where the length is over a threshold. We > have a register capture test: > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/record.sh#n81 > Perhaps the filter could look something like 'perf record -g -e > mem:$ADDRESS_OF_MEMCPY:x --filter "reg:rdx > 1024"' - this makes me > think we need to make a more convenient way to specify memory > addresses as symbols. I've been thinking about a similar idea on uftrace. It would filter the function based on the value of an argument or a global variable. Thanks, Namhyung > > > > > Ian Rogers (3): > > > perf bpf filter: Give terms their own enum > > > perf bpf filter: Add uid and gid terms > > > perf top: Allow filters on events > > > > > > tools/perf/Documentation/perf-record.txt | 2 +- > > > tools/perf/Documentation/perf-top.txt | 4 ++ > > > tools/perf/builtin-top.c | 9 +++ > > > tools/perf/util/bpf-filter.c | 55 ++++++++++++---- > > > tools/perf/util/bpf-filter.h | 5 +- > > > tools/perf/util/bpf-filter.l | 66 +++++++++---------- > > > tools/perf/util/bpf-filter.y | 7 +- > > > tools/perf/util/bpf_skel/sample-filter.h | 27 +++++++- > > > tools/perf/util/bpf_skel/sample_filter.bpf.c | 67 +++++++++++++++----- > > > 9 files changed, 172 insertions(+), 70 deletions(-) > > > > > > -- > > > 2.45.0.rc1.225.g2a3ae87e7f-goog > > >