From patchwork Tue Aug 13 23:41:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 13762700 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F8DB22EEF; Tue, 13 Aug 2024 23:41:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723592479; cv=none; b=ZaoPvDfnzlsO6phqdN/7R3oRe1wI6Es/6BuSDGFhg7Uk6/sUM/k2B24MGLnueGavHKXwRtevMX11VEX0CAeLm37PeyVqOlMO0b0o0afqn7xwi3riTahwqHdgVQYX9Vo1IJ1EzGMtvfKrKZDi9e/ManNigwB/iOircxLNOmyo4DU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723592479; c=relaxed/simple; bh=6mhEb9oLARyOU3kjN+5aPxpbXt4p56vv9zgrmma9/II=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=lphIp1ZQTm0rLOuMkYrkTeY2ory/DZDID8FNuSd4thnx/b7GXJZ1CKYDl9pt30eqqLo0AJuTi9HmPachlKSCWKiBYZDfC2HtTjGr5nAx/9eENXRd7XcrHq71qsB4h10ILQ0K3+0rozHkjmymCNxGOLdElP3+t2zINmwn5Jz4Erg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=RpkMtHJ8; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="RpkMtHJ8" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 871BDC32782; Tue, 13 Aug 2024 23:41:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723592479; bh=6mhEb9oLARyOU3kjN+5aPxpbXt4p56vv9zgrmma9/II=; h=From:To:Cc:Subject:Date:From; b=RpkMtHJ8c1EwbhQmnTFMgOUystM6u1xcbR4MIPTt1nRBwFxVD20N/4aHKbptfH0l9 ewF4r+fWOqOJ/oeVbI6Rif0uivVpRKKuV1XdXmmGtWBbkETbrSewwba1Cg2eo94MVQ e4jKRYeWMQsyOppJLk+Sl/sN1w30CMggvOTWqWGhqyuIp2vdLhC1W0LR/tCOareCmE sgbgxzMi1rmU4Ohvwzf+Ol/uU9CKFlDg3UlrALOsyuM1FwnbstgqUU2Lw+t9yGVu7T nA+Dx7C/VmXMwumwwkJj6jJ7f1izJsbM+qlP1rt29WMMDEVGYMAAwqUsBvt0SWPegb uq6dly+NnYjkQ== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, KP Singh , Song Liu , bpf@vger.kernel.org, Stephane Eranian Subject: [PATCH v2 1/2] perf bpf-filter: Support multiple events properly Date: Tue, 13 Aug 2024 16:41:16 -0700 Message-ID: <20240813234117.2265235-1-namhyung@kernel.org> X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 So far it used tgid as a key to get the filter expressions in the pinned filters map for regular users but it won't work well if the has more than one filters at the same time. Let's add the event id to the key of the filter hash map so that it can identify the right filter expression in the BPF program. As the event can be inherited to child tasks, it should use the primary id which belongs to the parent (original) event. Since evsel opens the event for multiple CPUs and tasks, it needs to maintain a separate hash map for the event id. In the user space, it keeps a list for the multiple evsel and release the entries in the both hash map when it closes the event. Signed-off-by: Namhyung Kim --- tools/perf/util/bpf-filter.c | 288 ++++++++++++++++--- tools/perf/util/bpf_skel/sample-filter.h | 11 +- tools/perf/util/bpf_skel/sample_filter.bpf.c | 42 ++- tools/perf/util/bpf_skel/vmlinux/vmlinux.h | 5 + 4 files changed, 304 insertions(+), 42 deletions(-) diff --git a/tools/perf/util/bpf-filter.c b/tools/perf/util/bpf-filter.c index c5eb0b7eec19..0a1832564dd2 100644 --- a/tools/perf/util/bpf-filter.c +++ b/tools/perf/util/bpf-filter.c @@ -1,4 +1,45 @@ /* SPDX-License-Identifier: GPL-2.0 */ +/** + * Generic event filter for sampling events in BPF. + * + * The BPF program is fixed and just to read filter expressions in the 'filters' + * map and compare the sample data in order to reject samples that don't match. + * Each filter expression contains a sample flag (term) to compare, an operation + * (==, >=, and so on) and a value. + * + * Note that each entry has an array of filter expressions and it only succeeds + * when all of the expressions are satisfied. But it supports the logical OR + * using a GROUP operation which is satisfied when any of its member expression + * is evaluated to true. But it doesn't allow nested GROUP operations for now. + * + * To support non-root users, the filters map can be loaded and pinned in the BPF + * filesystem by root (perf record --setup-filter pin). Then each user will get + * a new entry in the shared filters map to fill the filter expressions. And the + * BPF program will find the filter using (task-id, event-id) as a key. + * + * The pinned BPF object (shared for regular users) has: + * + * event_hash | + * | | | + * event->id ---> | id | ---+ idx_hash | filters + * | | | | | | | | + * | .... | +-> | idx | --+--> | exprs | ---> perf_bpf_filter_entry[] + * | | | | | | .op + * task id (tgid) --------------+ | .... | | | ... | .term (+ part) + * | .value + * | + * ======= (root would skip this part) ======== (compares it in a loop) + * + * This is used for per-task use cases while system-wide profiling (normally from + * root user) uses a separate copy of the program and the maps for its own so that + * it can proceed even if a lot of non-root users are using the filters at the + * same time. In this case the filters map has a single entry and no need to use + * the hash maps to get the index (key) of the filters map (IOW it's always 0). + * + * The BPF program returns 1 to accept the sample or 0 to drop it. + * The 'dropped' map is to keep how many samples it dropped by the filter and + * it will be reported as lost samples. + */ #include #include #include @@ -6,6 +47,7 @@ #include #include +#include #include #include #include @@ -27,7 +69,14 @@ #define PERF_SAMPLE_TYPE(_st, opt) __PERF_SAMPLE_TYPE(PBF_TERM_##_st, PERF_SAMPLE_##_st, opt) /* Index in the pinned 'filters' map. Should be released after use. */ -static int pinned_filter_idx = -1; +struct pinned_filter_idx { + struct list_head list; + struct evsel *evsel; + u64 event_id; + int hash_idx; +}; + +static LIST_HEAD(pinned_filters); static const struct perf_sample_info { enum perf_bpf_filter_term type; @@ -175,24 +224,145 @@ static int convert_to_tgid(int tid) return tgid; } -static int update_pid_hash(struct evsel *evsel, struct perf_bpf_filter_entry *entry) +/* + * The event might be closed already so we cannot get the list of ids using FD + * like in create_event_hash() below, let's iterate the event_hash map and + * delete all entries that have the event id as a key. + */ +static void destroy_event_hash(u64 event_id) +{ + int fd; + u64 key, *prev_key = NULL; + int num = 0, alloced = 32; + u64 *ids = calloc(alloced, sizeof(*ids)); + + if (ids == NULL) + return; + + fd = get_pinned_fd("event_hash"); + if (fd < 0) { + pr_debug("cannot get fd for 'event_hash' map\n"); + free(ids); + return; + } + + /* Iterate the whole map to collect keys for the event id. */ + while (!bpf_map_get_next_key(fd, prev_key, &key)) { + u64 id; + + if (bpf_map_lookup_elem(fd, &key, &id) == 0 && id == event_id) { + if (num == alloced) { + void *tmp; + + alloced *= 2; + tmp = realloc(ids, alloced * sizeof(*ids)); + if (tmp == NULL) + break; + + ids = tmp; + } + ids[num++] = key; + } + + prev_key = &key; + } + + for (int i = 0; i < num; i++) + bpf_map_delete_elem(fd, &ids[i]); + + free(ids); + close(fd); +} + +/* + * Return a representative id if ok, or 0 for failures. + * + * The perf_event->id is good for this, but an evsel would have multiple + * instances for CPUs and tasks. So pick up the first id and setup a hash + * from id of each instance to the representative id (the first one). + */ +static u64 create_event_hash(struct evsel *evsel) +{ + int x, y, fd; + u64 the_id = 0, id; + + fd = get_pinned_fd("event_hash"); + if (fd < 0) { + pr_err("cannot get fd for 'event_hash' map\n"); + return 0; + } + + for (x = 0; x < xyarray__max_x(evsel->core.fd); x++) { + for (y = 0; y < xyarray__max_y(evsel->core.fd); y++) { + int ret = ioctl(FD(evsel, x, y), PERF_EVENT_IOC_ID, &id); + + if (ret < 0) { + pr_err("Failed to get the event id\n"); + if (the_id) + destroy_event_hash(the_id); + return 0; + } + + if (the_id == 0) + the_id = id; + + bpf_map_update_elem(fd, &id, &the_id, BPF_ANY); + } + } + + close(fd); + return the_id; +} + +static void destroy_idx_hash(struct pinned_filter_idx *pfi) +{ + int fd, nr; + struct perf_thread_map *threads; + + fd = get_pinned_fd("filters"); + bpf_map_delete_elem(fd, &pfi->hash_idx); + close(fd); + + if (pfi->event_id) + destroy_event_hash(pfi->event_id); + + threads = perf_evsel__threads(&pfi->evsel->core); + if (threads == NULL) + return; + + fd = get_pinned_fd("idx_hash"); + nr = perf_thread_map__nr(threads); + for (int i = 0; i < nr; i++) { + /* The target task might be dead already, just try the pid */ + struct idx_hash_key key = { + .evt_id = pfi->event_id, + .tgid = perf_thread_map__pid(threads, i), + }; + + bpf_map_delete_elem(fd, &key); + } + close(fd); +} + +/* Maintain a hashmap from (tgid, event-id) to filter index */ +static int create_idx_hash(struct evsel *evsel, struct perf_bpf_filter_entry *entry) { int filter_idx; int fd, nr, last; + u64 event_id = 0; + struct pinned_filter_idx *pfi = NULL; struct perf_thread_map *threads; fd = get_pinned_fd("filters"); if (fd < 0) { - pr_debug("cannot get fd for 'filters' map\n"); + pr_err("cannot get fd for 'filters' map\n"); return fd; } /* Find the first available entry in the filters map */ for (filter_idx = 0; filter_idx < MAX_FILTERS; filter_idx++) { - if (bpf_map_update_elem(fd, &filter_idx, entry, BPF_NOEXIST) == 0) { - pinned_filter_idx = filter_idx; + if (bpf_map_update_elem(fd, &filter_idx, entry, BPF_NOEXIST) == 0) break; - } } close(fd); @@ -201,22 +371,44 @@ static int update_pid_hash(struct evsel *evsel, struct perf_bpf_filter_entry *en return -EBUSY; } + pfi = zalloc(sizeof(*pfi)); + if (pfi == NULL) { + pr_err("Cannot save pinned filter index\n"); + goto err; + } + + pfi->evsel = evsel; + pfi->hash_idx = filter_idx; + + event_id = create_event_hash(evsel); + if (event_id == 0) { + pr_err("Cannot update the event hash\n"); + goto err; + } + + pfi->event_id = event_id; + threads = perf_evsel__threads(&evsel->core); if (threads == NULL) { pr_err("Cannot get the thread list of the event\n"); - return -EINVAL; + goto err; } /* save the index to a hash map */ - fd = get_pinned_fd("pid_hash"); - if (fd < 0) - return fd; + fd = get_pinned_fd("idx_hash"); + if (fd < 0) { + pr_err("cannot get fd for 'idx_hash' map\n"); + goto err; + } last = -1; nr = perf_thread_map__nr(threads); for (int i = 0; i < nr; i++) { int pid = perf_thread_map__pid(threads, i); int tgid; + struct idx_hash_key key = { + .evt_id = event_id, + }; /* it actually needs tgid, let's get tgid from /proc. */ tgid = convert_to_tgid(pid); @@ -228,16 +420,25 @@ static int update_pid_hash(struct evsel *evsel, struct perf_bpf_filter_entry *en if (tgid == last) continue; last = tgid; + key.tgid = tgid; - if (bpf_map_update_elem(fd, &tgid, &filter_idx, BPF_ANY) < 0) { - pr_err("Failed to update the pid hash\n"); + if (bpf_map_update_elem(fd, &key, &filter_idx, BPF_ANY) < 0) { + pr_err("Failed to update the idx_hash\n"); close(fd); - return -1; + goto err; } - pr_debug("pid hash: %d -> %d\n", tgid, filter_idx); + pr_debug("bpf-filter: idx_hash (task=%d,%s) -> %d\n", + tgid, evsel__name(evsel), filter_idx); } + + list_add(&pfi->list, &pinned_filters); close(fd); - return 0; + return filter_idx; + +err: + destroy_idx_hash(pfi); + free(pfi); + return -1; } int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) @@ -247,7 +448,7 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) struct bpf_program *prog; struct bpf_link *link; struct perf_bpf_filter_entry *entry; - bool needs_pid_hash = !target__has_cpu(target) && !target->uid_str; + bool needs_idx_hash = !target__has_cpu(target) && !target->uid_str; entry = calloc(MAX_FILTERS, sizeof(*entry)); if (entry == NULL) @@ -259,11 +460,11 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) goto err; } - if (needs_pid_hash && geteuid() != 0) { + if (needs_idx_hash && geteuid() != 0) { int zero = 0; /* The filters map is shared among other processes */ - ret = update_pid_hash(evsel, entry); + ret = create_idx_hash(evsel, entry); if (ret < 0) goto err; @@ -274,7 +475,7 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) } /* Reset the lost count */ - bpf_map_update_elem(fd, &pinned_filter_idx, &zero, BPF_ANY); + bpf_map_update_elem(fd, &ret, &zero, BPF_ANY); close(fd); fd = get_pinned_fd("perf_sample_filter"); @@ -288,6 +489,7 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) ret = ioctl(FD(evsel, x, y), PERF_EVENT_IOC_SET_BPF, fd); if (ret < 0) { pr_err("Failed to attach perf sample-filter\n"); + close(fd); goto err; } } @@ -332,6 +534,15 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) err: free(entry); + if (!list_empty(&pinned_filters)) { + struct pinned_filter_idx *pfi, *tmp; + + list_for_each_entry_safe(pfi, tmp, &pinned_filters, list) { + destroy_idx_hash(pfi); + list_del(&pfi->list); + free(pfi); + } + } sample_filter_bpf__destroy(skel); return ret; } @@ -339,6 +550,7 @@ int perf_bpf_filter__prepare(struct evsel *evsel, struct target *target) int perf_bpf_filter__destroy(struct evsel *evsel) { struct perf_bpf_filter_expr *expr, *tmp; + struct pinned_filter_idx *pfi, *pos; list_for_each_entry_safe(expr, tmp, &evsel->bpf_filters, list) { list_del(&expr->list); @@ -346,14 +558,11 @@ int perf_bpf_filter__destroy(struct evsel *evsel) } sample_filter_bpf__destroy(evsel->bpf_skel); - if (pinned_filter_idx >= 0) { - int fd = get_pinned_fd("filters"); - - bpf_map_delete_elem(fd, &pinned_filter_idx); - pinned_filter_idx = -1; - close(fd); + list_for_each_entry_safe(pfi, pos, &pinned_filters, list) { + destroy_idx_hash(pfi); + list_del(&pfi->list); + free(pfi); } - return 0; } @@ -364,10 +573,20 @@ u64 perf_bpf_filter__lost_count(struct evsel *evsel) if (list_empty(&evsel->bpf_filters)) return 0; - if (pinned_filter_idx >= 0) { + if (!list_empty(&pinned_filters)) { int fd = get_pinned_fd("dropped"); + struct pinned_filter_idx *pfi; + + if (fd < 0) + return 0; - bpf_map_lookup_elem(fd, &pinned_filter_idx, &count); + list_for_each_entry(pfi, &pinned_filters, list) { + if (pfi->evsel != evsel) + continue; + + bpf_map_lookup_elem(fd, &pfi->hash_idx, &count); + break; + } close(fd); } else if (evsel->bpf_skel) { struct sample_filter_bpf *skel = evsel->bpf_skel; @@ -429,9 +648,10 @@ int perf_bpf_filter__pin(void) /* pinned program will use pid-hash */ bpf_map__set_max_entries(skel->maps.filters, MAX_FILTERS); - bpf_map__set_max_entries(skel->maps.pid_hash, MAX_PIDS); + bpf_map__set_max_entries(skel->maps.event_hash, MAX_EVT_HASH); + bpf_map__set_max_entries(skel->maps.idx_hash, MAX_IDX_HASH); bpf_map__set_max_entries(skel->maps.dropped, MAX_FILTERS); - skel->rodata->use_pid_hash = 1; + skel->rodata->use_idx_hash = 1; if (sample_filter_bpf__load(skel) < 0) { ret = -errno; @@ -484,8 +704,12 @@ int perf_bpf_filter__pin(void) pr_debug("chmod for filters failed\n"); ret = -errno; } - if (fchmodat(dir_fd, "pid_hash", 0666, 0) < 0) { - pr_debug("chmod for pid_hash failed\n"); + if (fchmodat(dir_fd, "event_hash", 0666, 0) < 0) { + pr_debug("chmod for event_hash failed\n"); + ret = -errno; + } + if (fchmodat(dir_fd, "idx_hash", 0666, 0) < 0) { + pr_debug("chmod for idx_hash failed\n"); ret = -errno; } if (fchmodat(dir_fd, "dropped", 0666, 0) < 0) { diff --git a/tools/perf/util/bpf_skel/sample-filter.h b/tools/perf/util/bpf_skel/sample-filter.h index e666bfd5fbdd..5f0c8e4e83d3 100644 --- a/tools/perf/util/bpf_skel/sample-filter.h +++ b/tools/perf/util/bpf_skel/sample-filter.h @@ -1,8 +1,9 @@ #ifndef PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H #define PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H -#define MAX_FILTERS 64 -#define MAX_PIDS (16 * 1024) +#define MAX_FILTERS 64 +#define MAX_IDX_HASH (16 * 1024) +#define MAX_EVT_HASH (1024 * 1024) /* supported filter operations */ enum perf_bpf_filter_op { @@ -62,4 +63,10 @@ struct perf_bpf_filter_entry { __u64 value; }; +struct idx_hash_key { + __u64 evt_id; + __u32 tgid; + __u32 reserved; +}; + #endif /* PERF_UTIL_BPF_SKEL_SAMPLE_FILTER_H */ diff --git a/tools/perf/util/bpf_skel/sample_filter.bpf.c b/tools/perf/util/bpf_skel/sample_filter.bpf.c index 4c75354b84fd..4872a16eedfd 100644 --- a/tools/perf/util/bpf_skel/sample_filter.bpf.c +++ b/tools/perf/util/bpf_skel/sample_filter.bpf.c @@ -15,13 +15,25 @@ struct filters { __uint(max_entries, 1); } filters SEC(".maps"); -/* tgid to filter index */ -struct pid_hash { +/* + * An evsel has multiple instances for each CPU or task but we need a single + * id to be used as a key for the idx_hash. This hashmap would translate the + * instance's ID to a representative ID. + */ +struct event_hash { __uint(type, BPF_MAP_TYPE_HASH); - __type(key, int); + __type(key, __u64); + __type(value, __u64); + __uint(max_entries, 1); +} event_hash SEC(".maps"); + +/* tgid/evtid to filter index */ +struct idx_hash { + __uint(type, BPF_MAP_TYPE_HASH); + __type(key, struct idx_hash_key); __type(value, int); __uint(max_entries, 1); -} pid_hash SEC(".maps"); +} idx_hash SEC(".maps"); /* tgid to filter index */ struct lost_count { @@ -31,7 +43,7 @@ struct lost_count { __uint(max_entries, 1); } dropped SEC(".maps"); -volatile const int use_pid_hash; +volatile const int use_idx_hash; void *bpf_cast_to_kern_ctx(void *) __ksym; @@ -202,11 +214,25 @@ int perf_sample_filter(void *ctx) k = 0; - if (use_pid_hash) { - int tgid = bpf_get_current_pid_tgid() >> 32; + if (use_idx_hash) { + struct idx_hash_key key = { + .tgid = bpf_get_current_pid_tgid() >> 32, + }; + __u64 eid = kctx->event->id; + __u64 *key_id; int *idx; - idx = bpf_map_lookup_elem(&pid_hash, &tgid); + /* get primary_event_id */ + if (kctx->event->parent) + eid = kctx->event->parent->id; + + key_id = bpf_map_lookup_elem(&event_hash, &eid); + if (key_id == NULL) + goto drop; + + key.evt_id = *key_id; + + idx = bpf_map_lookup_elem(&idx_hash, &key); if (idx) k = *idx; else diff --git a/tools/perf/util/bpf_skel/vmlinux/vmlinux.h b/tools/perf/util/bpf_skel/vmlinux/vmlinux.h index d818e30c5457..4fa21468487e 100644 --- a/tools/perf/util/bpf_skel/vmlinux/vmlinux.h +++ b/tools/perf/util/bpf_skel/vmlinux/vmlinux.h @@ -175,6 +175,11 @@ struct perf_sample_data { u64 code_page_size; } __attribute__((__aligned__(64))) __attribute__((preserve_access_index)); +struct perf_event { + struct perf_event *parent; + u64 id; +} __attribute__((preserve_access_index)); + struct bpf_perf_event_data_kern { struct perf_sample_data *data; struct perf_event *event; From patchwork Tue Aug 13 23:41:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 13762701 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1F35B1AC450; Tue, 13 Aug 2024 23:41:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723592480; cv=none; b=STJ+acLik5A2TQqp4C2haEDPHe7+FG9XiX7hGzlwpJ0TKnTruSwsL8bdun6o1LDRjRDsfNsN6/vqth/pIargDKLF4ApWta2jil7rzX/Emx2BWEvViGcE0BEx3QWp96t5jcZ/ByaaMYnkVB9A2BrvqTyIXlp5cWaX9VfdGeKyq6c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723592480; c=relaxed/simple; bh=9+G8X6mu1em8ktbp6Mkpzmj/jz1vEg2OnZe0C/AQ+Fs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uQwtQ0qjmRi1JvyNi1fM1ShTiikcsULN3IMiaD8N3+fqN53a6O+9JVKcxjabTMQAqL1AcfQXdFyOp6mY/7OSGWMeN9q6kMqObVMoQWFtVc1lK3GYtZ782pVJsh/NDvKGbZdagxUkXBc1C9FJWkbhzRZeWKHfd+BmQxnxe83Z2nY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=hWVkqTdb; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="hWVkqTdb" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2EF4AC4AF0C; Tue, 13 Aug 2024 23:41:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1723592479; bh=9+G8X6mu1em8ktbp6Mkpzmj/jz1vEg2OnZe0C/AQ+Fs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hWVkqTdbezOB1fC6mLqI9EYI/y+hWCfl/aPkhCTYAdixhFhYX0oNEGOmxGOtu8P+k rcwlFEg3haECU0ou5Y4nhw2xI2drYKu9OuhbIqO2nTFhWo3qUaZovyI2zaP6vLvnk9 W09TBJS4bBLNfgrPpAara+eEYXpG7lKVDvgkTwyhtOyRscvd93etRAfERjTtgiHCJH ojdrauKu2l6CFJ0HRMLQSRmghejAGjSeUgw6Y784lryQkUPPwla28nylVoX8xoGbRr vlXXjGZsj9xDmPJFlhXPTUbL2YZp5Sl2qlGl3cVJm/QpBIYl0bV15wyY6i6Fee1M0M erhCnxFkXw+Fw== From: Namhyung Kim To: Arnaldo Carvalho de Melo , Ian Rogers , Kan Liang Cc: Jiri Olsa , Adrian Hunter , Peter Zijlstra , Ingo Molnar , LKML , linux-perf-users@vger.kernel.org, KP Singh , Song Liu , bpf@vger.kernel.org, Stephane Eranian Subject: [PATCH v2 2/2] perf tools: Print lost samples due to BPF filter Date: Tue, 13 Aug 2024 16:41:17 -0700 Message-ID: <20240813234117.2265235-2-namhyung@kernel.org> X-Mailer: git-send-email 2.46.0.76.ge559c4bf1a-goog In-Reply-To: <20240813234117.2265235-1-namhyung@kernel.org> References: <20240813234117.2265235-1-namhyung@kernel.org> Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Print the actual dropped sample count in the event stat. $ sudo perf record -o- -e cycles --filter 'period < 10000' \ -e instructions --filter 'ip > 0x8000000000000000' perf test -w noploop | \ perf report --stat -i- [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.058 MB - ] Aggregated stats: TOTAL events: 469 MMAP events: 268 (57.1%) COMM events: 2 ( 0.4%) EXIT events: 1 ( 0.2%) SAMPLE events: 16 ( 3.4%) MMAP2 events: 22 ( 4.7%) LOST_SAMPLES events: 2 ( 0.4%) KSYMBOL events: 89 (19.0%) BPF_EVENT events: 39 ( 8.3%) ATTR events: 2 ( 0.4%) FINISHED_ROUND events: 1 ( 0.2%) ID_INDEX events: 1 ( 0.2%) THREAD_MAP events: 1 ( 0.2%) CPU_MAP events: 1 ( 0.2%) EVENT_UPDATE events: 2 ( 0.4%) TIME_CONV events: 1 ( 0.2%) FEATURE events: 20 ( 4.3%) FINISHED_INIT events: 1 ( 0.2%) cycles stats: SAMPLE events: 2 LOST_SAMPLES (BPF) events: 4010 instructions stats: SAMPLE events: 14 LOST_SAMPLES (BPF) events: 3990 Signed-off-by: Namhyung Kim --- tools/perf/builtin-report.c | 9 +++++++-- tools/perf/ui/stdio/hist.c | 4 ++-- tools/perf/util/events_stats.h | 15 ++++++++++++++- tools/perf/util/hist.c | 19 +++++++++++++++---- tools/perf/util/hist.h | 1 + tools/perf/util/machine.c | 5 +++-- tools/perf/util/session.c | 5 +++-- 7 files changed, 45 insertions(+), 13 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 1643113616f4..c304d9b06190 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -795,8 +795,13 @@ static int count_lost_samples_event(const struct perf_tool *tool, evsel = evlist__id2evsel(rep->session->evlist, sample->id); if (evsel) { - hists__inc_nr_lost_samples(evsel__hists(evsel), - event->lost_samples.lost); + struct hists *hists = evsel__hists(evsel); + u32 count = event->lost_samples.lost; + + if (event->header.misc & PERF_RECORD_MISC_LOST_SAMPLES_BPF) + hists__inc_nr_dropped_samples(hists, count); + else + hists__inc_nr_lost_samples(hists, count); } return 0; } diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c index 9372e8904d22..74b2c619c56c 100644 --- a/tools/perf/ui/stdio/hist.c +++ b/tools/perf/ui/stdio/hist.c @@ -913,11 +913,11 @@ size_t events_stats__fprintf(struct events_stats *stats, FILE *fp) continue; if (i && total) { - ret += fprintf(fp, "%16s events: %10d (%4.1f%%)\n", + ret += fprintf(fp, "%20s events: %10d (%4.1f%%)\n", name, stats->nr_events[i], 100.0 * stats->nr_events[i] / total); } else { - ret += fprintf(fp, "%16s events: %10d\n", + ret += fprintf(fp, "%20s events: %10d\n", name, stats->nr_events[i]); } } diff --git a/tools/perf/util/events_stats.h b/tools/perf/util/events_stats.h index f43e5b1a366a..eabd7913c309 100644 --- a/tools/perf/util/events_stats.h +++ b/tools/perf/util/events_stats.h @@ -18,7 +18,18 @@ * PERF_RECORD_LOST_SAMPLES event. The number of lost-samples events is stored * in .nr_events[PERF_RECORD_LOST_SAMPLES] while total_lost_samples tells * exactly how many samples the kernel in fact dropped, i.e. it is the sum of - * all struct perf_record_lost_samples.lost fields reported. + * all struct perf_record_lost_samples.lost fields reported without setting the + * misc field in the header. + * + * The BPF program can discard samples according to the filter expressions given + * by the user. This number is kept in a BPF map and dumped at the end of perf + * record in a PERF_RECORD_LOST_SAMPLES event. To differentiate it from other + * lost samples, perf tools sets PERF_RECORD_MISC_LOST_SAMPLES_BPF flag in the + * header.misc field. The number of dropped-samples events is stored in + * .nr_events[PERF_RECORD_LOST_SAMPLES] while total_dropped_samples tells + * exactly how many samples the BPF program in fact dropped, i.e. it is the sum + * of all struct perf_record_lost_samples.lost fields reported with the misc + * field set in the header. * * The total_period is needed because by default auto-freq is used, so * multiplying nr_events[PERF_EVENT_SAMPLE] by a frequency isn't possible to get @@ -28,6 +39,7 @@ struct events_stats { u64 total_lost; u64 total_lost_samples; + u64 total_dropped_samples; u64 total_aux_lost; u64 total_aux_partial; u64 total_aux_collision; @@ -48,6 +60,7 @@ struct hists_stats { u32 nr_samples; u32 nr_non_filtered_samples; u32 nr_lost_samples; + u32 nr_dropped_samples; }; void events_stats__inc(struct events_stats *stats, u32 type); diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 0f554febf9a1..29cbbd2c7d6f 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -2380,6 +2380,11 @@ void hists__inc_nr_lost_samples(struct hists *hists, u32 lost) hists->stats.nr_lost_samples += lost; } +void hists__inc_nr_dropped_samples(struct hists *hists, u32 lost) +{ + hists->stats.nr_dropped_samples += lost; +} + static struct hist_entry *hists__add_dummy_entry(struct hists *hists, struct hist_entry *pair) { @@ -2724,18 +2729,24 @@ size_t evlist__fprintf_nr_events(struct evlist *evlist, FILE *fp) evlist__for_each_entry(evlist, pos) { struct hists *hists = evsel__hists(pos); + u64 total_samples = hists->stats.nr_samples; + + total_samples += hists->stats.nr_lost_samples; + total_samples += hists->stats.nr_dropped_samples; - if (symbol_conf.skip_empty && !hists->stats.nr_samples && - !hists->stats.nr_lost_samples) + if (symbol_conf.skip_empty && total_samples == 0) continue; ret += fprintf(fp, "%s stats:\n", evsel__name(pos)); if (hists->stats.nr_samples) - ret += fprintf(fp, "%16s events: %10d\n", + ret += fprintf(fp, "%20s events: %10d\n", "SAMPLE", hists->stats.nr_samples); if (hists->stats.nr_lost_samples) - ret += fprintf(fp, "%16s events: %10d\n", + ret += fprintf(fp, "%20s events: %10d\n", "LOST_SAMPLES", hists->stats.nr_lost_samples); + if (hists->stats.nr_dropped_samples) + ret += fprintf(fp, "%20s events: %10d\n", + "LOST_SAMPLES (BPF)", hists->stats.nr_dropped_samples); } return ret; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 30c13fc8cbe4..4ea247e54fb6 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -371,6 +371,7 @@ void hists__inc_stats(struct hists *hists, struct hist_entry *h); void hists__inc_nr_events(struct hists *hists); void hists__inc_nr_samples(struct hists *hists, bool filtered); void hists__inc_nr_lost_samples(struct hists *hists, u32 lost); +void hists__inc_nr_dropped_samples(struct hists *hists, u32 lost); size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows, int max_cols, float min_pcnt, FILE *fp, diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index c08774d06d14..00a49d0744fc 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -642,8 +642,9 @@ int machine__process_lost_event(struct machine *machine __maybe_unused, int machine__process_lost_samples_event(struct machine *machine __maybe_unused, union perf_event *event, struct perf_sample *sample) { - dump_printf(": id:%" PRIu64 ": lost samples :%" PRI_lu64 "\n", - sample->id, event->lost_samples.lost); + dump_printf(": id:%" PRIu64 ": lost samples :%" PRI_lu64 "%s\n", + sample->id, event->lost_samples.lost, + event->header.misc & PERF_RECORD_MISC_LOST_SAMPLES_BPF ? " (BPF)" : ""); return 0; } diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index d2bd563119bc..774eb3382000 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1290,8 +1290,9 @@ static int machines__deliver_event(struct machines *machines, evlist->stats.total_lost += event->lost.lost; return tool->lost(tool, event, sample, machine); case PERF_RECORD_LOST_SAMPLES: - if (tool->lost_samples == perf_event__process_lost_samples && - !(event->header.misc & PERF_RECORD_MISC_LOST_SAMPLES_BPF)) + if (event->header.misc & PERF_RECORD_MISC_LOST_SAMPLES_BPF) + evlist->stats.total_dropped_samples += event->lost_samples.lost; + else if (tool->lost_samples == perf_event__process_lost_samples) evlist->stats.total_lost_samples += event->lost_samples.lost; return tool->lost_samples(tool, event, sample, machine); case PERF_RECORD_READ: