From patchwork Fri Apr 22 05:33:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12822752 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF545C43217 for ; Fri, 22 Apr 2022 05:37:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232531AbiDVFkE (ORCPT ); Fri, 22 Apr 2022 01:40:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444170AbiDVFg6 (ORCPT ); Fri, 22 Apr 2022 01:36:58 -0400 Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9F44D4F448; Thu, 21 Apr 2022 22:34:05 -0700 (PDT) Received: by mail-pl1-x629.google.com with SMTP id h12so4830011plf.12; Thu, 21 Apr 2022 22:34:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wqQRIgQO3ULhYUlV88GiQPjDyhm28U+v+Mde/WfWJ9Y=; b=qqEoI8eo8BjZkhFVwnf33n6mNuwkFG3qjyshIpZsVXo8tMiLniU0/PIaAOXuFvRYSz 8/vazd6OKW5VyOzzIi9NrRELj5I2osHK54ZJLG8EMuHZo6Ewn2tJyVW4UTzMlJEDmhE9 piEzbIny8mf4BoAhdE9EB0tS58Kqq1Q56wkjVyRtGOCOqgbl3fHB/QZQHeRxLXtMCoDz LcUNK1m2KgZo3TigsdmMQyfNI0KYZyYF9uoh8xZ6AQxUUazZcaNxGRBZ+gomGK5A+kP1 KTkHwvio0yar95EvFY5gjd173eXGR3xBi/sX3JijPVk5De/E+vR07UzO4I7U0lVVvFXq SpPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=wqQRIgQO3ULhYUlV88GiQPjDyhm28U+v+Mde/WfWJ9Y=; b=btpDOr/pmtQLDDh+dNA2XdzzN85ipT3VLGzg7kIdODNZN7lUAlFiByu11DoZsNIjpl u/xn6VBq+S5wYZqLdY4tCuDBk8e6UM6sTbINC7FoFnh2JVGsXAUDbYVsd4hJyPmyjczP hexAwQxh4zFbtdp/u2xMlMJl/sTqirsLjmxTV7ZBCyv6b9lCtjnGgaSJQV6wLtkqQi0a VY+TbDxwQHcj8PBkfx6opBWI9k6vH+mjOuFDmHgNe1hsu3fUdaoqwJjPYPsHJpRPuW1p 5jYUMzEkILcNxgT4QGrGdCPRFp06veqvIKMVsyIjhMr9cZct1VZ4hA6HcgCCCDCW5jBa 6lTw== X-Gm-Message-State: AOAM532WwB1w1vyfqqTXXLGkYxBqxmTi9PcC1hVduXY0u6EIoXCj3Cce 4BHEPJJoG9kae4VBhv82Cq0= X-Google-Smtp-Source: ABdhPJypVPj6HGfH5oNbElSsizeqDJta0gc+MwCQmem8TojkAPYSmM/tpM+1C/+vd0S0fnuRaM2Onw== X-Received: by 2002:a17:902:aa06:b0:158:f13b:4859 with SMTP id be6-20020a170902aa0600b00158f13b4859mr2812716plb.141.1650605645099; Thu, 21 Apr 2022 22:34:05 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:32e3:a023:46c1:80cd]) by smtp.gmail.com with ESMTPSA id 204-20020a6302d5000000b00385f29b02b2sm886519pgc.50.2022.04.21.22.34.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 22:34:04 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 1/4] perf report: Do not extend sample type of bpf-output event Date: Thu, 21 Apr 2022 22:33:58 -0700 Message-Id: <20220422053401.208207-2-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog In-Reply-To: <20220422053401.208207-1-namhyung@kernel.org> References: <20220422053401.208207-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Currently evsel__new_idx() sets more sample_type bits when it finds a BPF-output event. But it should honor what's recorded in the perf data file rather than blindly sets the bits. Otherwise it could lead to a parse error when it recorded with a modified sample_type. Signed-off-by: Namhyung Kim --- tools/perf/util/evsel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 2a1729e7aee4..5f947adc16cb 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -269,8 +269,8 @@ struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) return NULL; evsel__init(evsel, attr, idx); - if (evsel__is_bpf_output(evsel)) { - evsel->core.attr.sample_type |= (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | + if (evsel__is_bpf_output(evsel) && !attr->sample_type) { + evsel->core.attr.sample_type = (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD), evsel->core.attr.sample_period = 1; } From patchwork Fri Apr 22 05:33:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12822749 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8ED88C4321E for ; Fri, 22 Apr 2022 05:37:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231343AbiDVFjm (ORCPT ); Fri, 22 Apr 2022 01:39:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57888 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230237AbiDVFg7 (ORCPT ); Fri, 22 Apr 2022 01:36:59 -0400 Received: from mail-pl1-x62f.google.com (mail-pl1-x62f.google.com [IPv6:2607:f8b0:4864:20::62f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29EEE4F445; Thu, 21 Apr 2022 22:34:07 -0700 (PDT) Received: by mail-pl1-x62f.google.com with SMTP id h12so4830120plf.12; Thu, 21 Apr 2022 22:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=eGLKHmxjxA7RbR8RHgNL9UCWC8/SO4Ne5kRsH2k78iU=; b=qszOBEKp2ELVJNJRhGKrWqXOY3b+DBXR5/XBsiOtnuGbb4iFqrsjVNkhXleQZORYcO RgoMQuriRX3XFQ/oaWOa0M9XeJPN97FOEGOjsW91uXsHTsM9dnKpJGrrGQjRdkO99VwE 2MT20Qvz6lxUnB8T/ZhAc6JREgANoFobdxqnwEffo4YlGQkQ/bJ2dk4nW5cAC6YsGgjt SvqQvaGUGAvaa0YmiW5a7V5weMQY/Qy0Lg6v5jfdkhjFcf5q90OmPgCgGO8IzZsLBTCF C0YjeX/ZjeuMu6ayZzu3Bzxn7moEKbsp2yXnA0duJVG5/fsRPAhharMER52Xhm2ILC1e W2yQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=eGLKHmxjxA7RbR8RHgNL9UCWC8/SO4Ne5kRsH2k78iU=; b=TR6+NcJ4sPDlwHmYjos2Ld+SxJW/inhm2Wfa9ApWe0QxmCEGdDXDrrfQwMup86PFJN m8HiJLEkiJ7KGyLsuXoJ2en+mawcYBT1naswg/gmRC6d9lI7T+N9S2v1WDdvrajmEPN6 ywnxiV8g0TpQ4q7N8MqeSj/d+b8on2u6tCXNcqU918J+7xcPVQ/bOpMyKFgZD+uESEDC 29KdReFGbwaa4PtjlbJvyGWoPCgZrT41G0OwWtESArTRHOF8zqHQTL7zKhi3hfc51F0S HSY3Uvlyo3Lb0RJvlps23xYcll0bF2KB78lOx4wAqvrifJKTGxp/+06GWTolKwHeOQ0q /ZEw== X-Gm-Message-State: AOAM531codtZEUdKz+eIRyYVmoktCQ3DsltmyM6x3oIoZRxVa2p4aSCd HzLAzkeuu61vKiA9R6+xL9A= X-Google-Smtp-Source: ABdhPJy3UUMntZCnSJi8DBgIEC1GM7oOJVJGZ0QH+kL74oR7ubEaSylhmftGJHGvc7qdWG04iaJ0xA== X-Received: by 2002:a17:903:11c7:b0:151:9769:3505 with SMTP id q7-20020a17090311c700b0015197693505mr2834718plh.72.1650605646539; Thu, 21 Apr 2022 22:34:06 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:32e3:a023:46c1:80cd]) by smtp.gmail.com with ESMTPSA id 204-20020a6302d5000000b00385f29b02b2sm886519pgc.50.2022.04.21.22.34.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 22:34:06 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 2/4] perf record: Enable off-cpu analysis with BPF Date: Thu, 21 Apr 2022 22:33:59 -0700 Message-Id: <20220422053401.208207-3-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog In-Reply-To: <20220422053401.208207-1-namhyung@kernel.org> References: <20220422053401.208207-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add --off-cpu option to enable the off-cpu profiling with BPF. It'd use a bpf_output event and rename it to "offcpu-time". Samples will be synthesized at the end of the record session using data from a BPF map which contains the aggregated off-cpu time at context switches. So it needs root privilege to get the off-cpu profiling. Each sample will have a separate user stacktrace so it will skip kernel threads. The sample ip will be set from the stacktrace and other sample data will be updated accordingly. Currently it only handles some basic sample types. The sample timestamp is set to a dummy value just not to bother with other events during the sorting. So it has a very big initial value and increase it on processing each samples. Good thing is that it can be used together with regular profiling like cpu cycles. If you don't want to that, you can use a dummy event to enable off-cpu profiling only. Example output: $ sudo perf record --off-cpu perf bench sched messaging -l 1000 $ sudo perf report --stdio --call-graph=no # Total Lost Samples: 0 # # Samples: 41K of event 'cycles' # Event count (approx.): 42137343851 ... # Samples: 1K of event 'offcpu-time' # Event count (approx.): 587990831640 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. ......................... # 81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main 81.66% 0.00% sched-messaging perf [.] cmd_bench 81.66% 0.00% sched-messaging perf [.] main 81.66% 0.00% sched-messaging perf [.] run_builtin 81.43% 0.00% sched-messaging perf [.] bench_sched_messaging 40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read 37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write 2.91% 2.91% sched-messaging libc-2.33.so [.] __poll ... As you can see it spent most of off-cpu time in read and write in bench_sched_messaging(). The --call-graph=no was added just to make the output concise here. It uses perf hooks facility to control BPF program during the record session rather than adding new BPF/off-cpu specific calls. Signed-off-by: Namhyung Kim --- tools/perf/Makefile.perf | 1 + tools/perf/builtin-record.c | 21 +++ tools/perf/util/Build | 1 + tools/perf/util/bpf_off_cpu.c | 208 +++++++++++++++++++++++++ tools/perf/util/bpf_skel/off_cpu.bpf.c | 137 ++++++++++++++++ 5 files changed, 368 insertions(+) create mode 100644 tools/perf/util/bpf_off_cpu.c create mode 100644 tools/perf/util/bpf_skel/off_cpu.bpf.c diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index 69473a836bae..ce333327182a 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -1041,6 +1041,7 @@ SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp) SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h +SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_TMP_OUT) $(LIBBPF_OUTPUT): $(Q)$(MKDIR) -p $@ diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index ba74fab02e62..3d24d528ba8e 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -49,6 +49,7 @@ #include "util/clockid.h" #include "util/pmu-hybrid.h" #include "util/evlist-hybrid.h" +#include "util/off_cpu.h" #include "asm/bug.h" #include "perf.h" #include "cputopo.h" @@ -162,6 +163,7 @@ struct record { bool buildid_mmap; bool timestamp_filename; bool timestamp_boundary; + bool off_cpu; struct switch_output switch_output; unsigned long long samples; unsigned long output_max_size; /* = 0: unlimited */ @@ -903,6 +905,11 @@ static int record__config_text_poke(struct evlist *evlist) return 0; } +static int record__config_off_cpu(struct record *rec) +{ + return off_cpu_prepare(rec->evlist); +} + static bool record__kcore_readable(struct machine *machine) { char kcore[PATH_MAX]; @@ -2596,6 +2603,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) } else status = err; + if (rec->off_cpu) + rec->bytes_written += off_cpu_write(rec->session); + record__synthesize(rec, true); /* this will be recalculated during process_buildids() */ rec->samples = 0; @@ -3320,6 +3330,9 @@ static struct option __record_options[] = { OPT_CALLBACK_OPTARG(0, "threads", &record.opts, NULL, "spec", "write collected trace data into several data files using parallel threads", record__parse_threads), +#ifdef HAVE_BPF_SKEL + OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"), +#endif OPT_END() }; @@ -3968,6 +3981,14 @@ int cmd_record(int argc, const char **argv) } } + if (rec->off_cpu) { + err = record__config_off_cpu(rec); + if (err) { + pr_err("record__config_off_cpu failed, error %d\n", err); + goto out; + } + } + if (record_opts__config(&rec->opts)) { err = -EINVAL; goto out; diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 9a7209a99e16..a51267d88ca9 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -147,6 +147,7 @@ perf-$(CONFIG_LIBBPF) += bpf_map.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter_cgroup.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_ftrace.o +perf-$(CONFIG_PERF_BPF_SKEL) += bpf_off_cpu.o perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o perf-$(CONFIG_LIBELF) += symbol-elf.o perf-$(CONFIG_LIBELF) += probe-file.o diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c new file mode 100644 index 000000000000..1f87d2a9b86d --- /dev/null +++ b/tools/perf/util/bpf_off_cpu.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "util/bpf_counter.h" +#include "util/debug.h" +#include "util/evsel.h" +#include "util/evlist.h" +#include "util/off_cpu.h" +#include "util/perf-hooks.h" +#include "util/session.h" +#include + +#include "bpf_skel/off_cpu.skel.h" + +#define MAX_STACKS 32 +/* we don't need actual timestamp, just want to put the samples at last */ +#define OFF_CPU_TIMESTAMP (~0ull << 32) + +static struct off_cpu_bpf *skel; + +struct off_cpu_key { + u32 pid; + u32 tgid; + u32 stack_id; + u32 state; + u64 cgroup_id; +}; + +union off_cpu_data { + struct perf_event_header hdr; + u64 array[1024 / sizeof(u64)]; +}; + +static int off_cpu_config(struct evlist *evlist) +{ + struct evsel *evsel; + struct perf_event_attr attr = { + .type = PERF_TYPE_SOFTWARE, + .config = PERF_COUNT_SW_BPF_OUTPUT, + .size = sizeof(attr), /* to capture ABI version */ + }; + + evsel = evsel__new(&attr); + if (!evsel) + return -ENOMEM; + + evsel->core.attr.freq = 1; + evsel->core.attr.sample_period = 1; + /* off-cpu analysis depends on stack trace */ + evsel->core.attr.sample_type = PERF_SAMPLE_CALLCHAIN; + + evlist__add(evlist, evsel); + + free(evsel->name); + evsel->name = strdup("offcpu-time"); + if (evsel->name == NULL) + return -ENOMEM; + + return 0; +} + +static void off_cpu_start(void *arg __maybe_unused) +{ + skel->bss->enabled = 1; +} + +static void off_cpu_finish(void *arg __maybe_unused) +{ + skel->bss->enabled = 0; + off_cpu_bpf__destroy(skel); +} + +int off_cpu_prepare(struct evlist *evlist) +{ + int err; + + if (off_cpu_config(evlist) < 0) { + pr_err("Failed to config off-cpu BPF event\n"); + return -1; + } + + set_max_rlimit(); + + skel = off_cpu_bpf__open_and_load(); + if (!skel) { + pr_err("Failed to open off-cpu skeleton\n"); + return -1; + } + + err = off_cpu_bpf__attach(skel); + if (err) { + pr_err("Failed to attach off-cpu skeleton\n"); + goto out; + } + + if (perf_hooks__set_hook("record_start", off_cpu_start, NULL) || + perf_hooks__set_hook("record_end", off_cpu_finish, NULL)) { + pr_err("Failed to attach off-cpu skeleton\n"); + goto out; + } + + return 0; + +out: + off_cpu_bpf__destroy(skel); + return -1; +} + +int off_cpu_write(struct perf_session *session) +{ + int bytes = 0, size; + int fd, stack; + bool found = false; + u64 sample_type, val, sid = 0; + struct evsel *evsel; + struct perf_data_file *file = &session->data->file; + struct off_cpu_key prev, key; + union off_cpu_data data = { + .hdr = { + .type = PERF_RECORD_SAMPLE, + .misc = PERF_RECORD_MISC_USER, + }, + }; + u64 tstamp = OFF_CPU_TIMESTAMP; + + skel->bss->enabled = 0; + + evlist__for_each_entry(session->evlist, evsel) { + if (!strcmp(evsel__name(evsel), "offcpu-time")) { + found = true; + break; + } + } + + if (!found) { + pr_err("offcpu-time evsel not found\n"); + return 0; + } + + sample_type = evsel->core.attr.sample_type; + + if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER)) { + if (evsel->core.id) + sid = evsel->core.id[0]; + } + + fd = bpf_map__fd(skel->maps.off_cpu); + stack = bpf_map__fd(skel->maps.stacks); + memset(&prev, 0, sizeof(prev)); + + while (!bpf_map_get_next_key(fd, &prev, &key)) { + int n = 1; /* start from perf_event_header */ + int ip_pos = -1; + + bpf_map_lookup_elem(fd, &key, &val); + + if (sample_type & PERF_SAMPLE_IDENTIFIER) + data.array[n++] = sid; + if (sample_type & PERF_SAMPLE_IP) { + ip_pos = n; + data.array[n++] = 0; /* will be updated */ + } + if (sample_type & PERF_SAMPLE_TID) + data.array[n++] = (u64)key.pid << 32 | key.tgid; + if (sample_type & PERF_SAMPLE_TIME) + data.array[n++] = tstamp; + if (sample_type & PERF_SAMPLE_ID) + data.array[n++] = sid; + if (sample_type & PERF_SAMPLE_CPU) + data.array[n++] = 0; + if (sample_type & PERF_SAMPLE_PERIOD) + data.array[n++] = val; + if (sample_type & PERF_SAMPLE_CALLCHAIN) { + int len = 0; + + /* data.array[n] is callchain->nr (updated later) */ + data.array[n + 1] = PERF_CONTEXT_USER; + data.array[n + 2] = 0; + + bpf_map_lookup_elem(stack, &key.stack_id, &data.array[n + 2]); + while (data.array[n + 2 + len]) + len++; + + /* update length of callchain */ + data.array[n] = len + 1; + + /* update sample ip with the first callchain entry */ + if (ip_pos >= 0) + data.array[ip_pos] = data.array[n + 2]; + + /* calculate sample callchain data array length */ + n += len + 2; + } + /* TODO: handle more sample types */ + + size = n * sizeof(u64); + data.hdr.size = size; + bytes += size; + + if (perf_data_file__write(file, &data, size) < 0) { + pr_err("failed to write perf data, error: %m\n"); + return bytes; + } + + prev = key; + /* increase dummy timestamp to sort later samples */ + tstamp++; + } + return bytes; +} diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c new file mode 100644 index 000000000000..2bc6f7cc59ea --- /dev/null +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -0,0 +1,137 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +// Copyright (c) 2022 Google +#include "vmlinux.h" +#include +#include +#include + +/* task->flags for off-cpu analysis */ +#define PF_KTHREAD 0x00200000 /* I am a kernel thread */ + +/* task->state for off-cpu analysis */ +#define TASK_INTERRUPTIBLE 0x0001 +#define TASK_UNINTERRUPTIBLE 0x0002 + +#define MAX_STACKS 32 +#define MAX_ENTRIES 102400 + +struct tstamp_data { + __u32 stack_id; + __u32 state; + __u64 timestamp; +}; + +struct offcpu_key { + __u32 pid; + __u32 tgid; + __u32 stack_id; + __u32 state; +}; + +struct { + __uint(type, BPF_MAP_TYPE_STACK_TRACE); + __uint(key_size, sizeof(__u32)); + __uint(value_size, MAX_STACKS * sizeof(__u64)); + __uint(max_entries, MAX_ENTRIES); +} stacks SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(struct tstamp_data)); + __uint(max_entries, MAX_ENTRIES); +} tstamp SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(struct offcpu_key)); + __uint(value_size, sizeof(__u64)); + __uint(max_entries, MAX_ENTRIES); +} off_cpu SEC(".maps"); + +/* old kernel task_struct definition */ +struct task_struct___old { + long state; +} __attribute__((preserve_access_index)); + +int enabled = 0; + +/* + * recently task_struct->state renamed to __state so it made an incompatible + * change. Use BPF CO-RE "ignored suffix rule" to deal with it like below: + * + * https://nakryiko.com/posts/bpf-core-reference-guide/#handling-incompatible-field-and-type-changes + */ +static inline int get_task_state(struct task_struct *t) +{ + if (bpf_core_field_exists(t->__state)) + return BPF_CORE_READ(t, __state); + + /* recast pointer to capture task_struct___old type for compiler */ + struct task_struct___old *t_old = (void *)t; + + /* now use old "state" name of the field */ + return BPF_CORE_READ(t_old, state); +} + +SEC("tp_btf/sched_switch") +int on_switch(u64 *ctx) +{ + __u64 ts; + int state; + __u32 pid, stack_id; + struct task_struct *prev, *next; + struct tstamp_data elem, *pelem; + + if (!enabled) + return 0; + + prev = (struct task_struct *)ctx[1]; + next = (struct task_struct *)ctx[2]; + state = get_task_state(prev); + + ts = bpf_ktime_get_ns(); + + if (prev->flags & PF_KTHREAD) + goto next; + if (state != TASK_INTERRUPTIBLE && + state != TASK_UNINTERRUPTIBLE) + goto next; + + stack_id = bpf_get_stackid(ctx, &stacks, + BPF_F_FAST_STACK_CMP | BPF_F_USER_STACK); + + elem.timestamp = ts; + elem.state = state; + elem.stack_id = stack_id; + + pid = prev->pid; + bpf_map_update_elem(&tstamp, &pid, &elem, BPF_ANY); + +next: + pid = next->pid; + pelem = bpf_map_lookup_elem(&tstamp, &pid); + + if (pelem) { + struct offcpu_key key = { + .pid = next->pid, + .tgid = next->tgid, + .stack_id = pelem->stack_id, + .state = pelem->state, + }; + __u64 delta = ts - pelem->timestamp; + __u64 *total; + + total = bpf_map_lookup_elem(&off_cpu, &key); + if (total) + *total += delta; + else + bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY); + + bpf_map_delete_elem(&tstamp, &pid); + } + + return 0; +} + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; From patchwork Fri Apr 22 05:34:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12822751 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00E7FC433EF for ; Fri, 22 Apr 2022 05:37:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232438AbiDVFkA (ORCPT ); Fri, 22 Apr 2022 01:40:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57894 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444172AbiDVFhA (ORCPT ); Fri, 22 Apr 2022 01:37:00 -0400 Received: from mail-pl1-x62c.google.com (mail-pl1-x62c.google.com [IPv6:2607:f8b0:4864:20::62c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AEACE4F448; Thu, 21 Apr 2022 22:34:08 -0700 (PDT) Received: by mail-pl1-x62c.google.com with SMTP id u15so5115687ple.4; Thu, 21 Apr 2022 22:34:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=em5/zodA7layme8bSqQKMP2f6/tfNS7ip5sBkTMrN9A=; b=KHZZ22qesd+CEISUYdFzi/FRutHa+q7TOp4ZVXeLvsIlanz8UcRxkz155LV4hu4BPH Bh4yu0QLs20Z31QlvRc9CiswHI9xIQ8kujCXzcNmRuqFNUCoFNdTCoEDSNBS3cVUGXJV r9gZmPrLoecxN3TjRrF6Cf/LcuYw55cRxLvhK1rnHAreb6E0NrUi4ht2JpCxRHKBriyh mDRBE70pOeSd8WvXKJhxLJAaisb/MEO/PJu65Q5dYDTBebmXr1YQ0IXYRmVZ02u/RTWr VmRInTKR9xndu4YulM+vyKdBOKwxFeKtvIhmD3PP/BzqxpYGU4rWC3srJ7VnzSAAPvgU BPwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=em5/zodA7layme8bSqQKMP2f6/tfNS7ip5sBkTMrN9A=; b=TXQsmN7wycrFhmgKE91SVXSn1akczCYuXPcNYPVutKyEcXWGZaIWPpPnyzByGx/gL5 nGjLbQt7LPwwGZ28tPAZKK+BiorfGwnZsejs0cxzRHFhPcx9WzhknsQOugVagbPMUyYY FI58TlKZm3gddnok6UWOSLM6gqJbsemobz66M+SfaJq3nq2XLw/EpRDTrT1Xe4MW92Gr TS9rp1uD77U1EtDaHNH/+uyY7DEbMJFYL4ldLX4/Ad9WHP0zRoekoVaWy8o+d67v2ZkP 9ivzKbg5Rrh7FM7u2MrO3De1AUNl2w/+9rqFKzAlqZ3cLumFPJz8nih8ZmASsnJaPmkK Ve9Q== X-Gm-Message-State: AOAM5330MHUQZHVGBjuJtdjW0wEzERQGezlLBtxILhM/yL4u0R7ykqI2 EEl4UFTt/NnUepm2yLx3LDg= X-Google-Smtp-Source: ABdhPJzbOYKR6l9hGWbRogqQB/Harv/b9Lyt2KCiBOt6R93CV7V+sDrAIl70df7Bs3CDdP2hKFd+IQ== X-Received: by 2002:a17:903:28d:b0:158:ee84:e588 with SMTP id j13-20020a170903028d00b00158ee84e588mr2820672plr.60.1650605648233; Thu, 21 Apr 2022 22:34:08 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:32e3:a023:46c1:80cd]) by smtp.gmail.com with ESMTPSA id 204-20020a6302d5000000b00385f29b02b2sm886519pgc.50.2022.04.21.22.34.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 22:34:07 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 3/4] perf record: Implement basic filtering for off-cpu Date: Thu, 21 Apr 2022 22:34:00 -0700 Message-Id: <20220422053401.208207-4-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog In-Reply-To: <20220422053401.208207-1-namhyung@kernel.org> References: <20220422053401.208207-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org It should honor cpu and task filtering with -a, -C or -p, -t options. Signed-off-by: Namhyung Kim --- tools/perf/builtin-record.c | 2 +- tools/perf/util/bpf_off_cpu.c | 78 +++++++++++++++++++++++--- tools/perf/util/bpf_skel/off_cpu.bpf.c | 52 +++++++++++++++-- 3 files changed, 119 insertions(+), 13 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 3d24d528ba8e..592384e058c3 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -907,7 +907,7 @@ static int record__config_text_poke(struct evlist *evlist) static int record__config_off_cpu(struct record *rec) { - return off_cpu_prepare(rec->evlist); + return off_cpu_prepare(rec->evlist, &rec->opts.target); } static bool record__kcore_readable(struct machine *machine) diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c index 1f87d2a9b86d..89f36229041d 100644 --- a/tools/perf/util/bpf_off_cpu.c +++ b/tools/perf/util/bpf_off_cpu.c @@ -6,6 +6,9 @@ #include "util/off_cpu.h" #include "util/perf-hooks.h" #include "util/session.h" +#include "util/target.h" +#include "util/cpumap.h" +#include "util/thread_map.h" #include #include "bpf_skel/off_cpu.skel.h" @@ -57,8 +60,23 @@ static int off_cpu_config(struct evlist *evlist) return 0; } -static void off_cpu_start(void *arg __maybe_unused) +static void off_cpu_start(void *arg) { + struct evlist *evlist = arg; + + /* update task filter for the given workload */ + if (!skel->bss->has_cpu && !skel->bss->has_task && + perf_thread_map__pid(evlist->core.threads, 0) != -1) { + int fd; + u32 pid; + u8 val = 1; + + skel->bss->has_task = 1; + fd = bpf_map__fd(skel->maps.task_filter); + pid = perf_thread_map__pid(evlist->core.threads, 0); + bpf_map_update_elem(fd, &pid, &val, BPF_ANY); + } + skel->bss->enabled = 1; } @@ -68,31 +86,75 @@ static void off_cpu_finish(void *arg __maybe_unused) off_cpu_bpf__destroy(skel); } -int off_cpu_prepare(struct evlist *evlist) +int off_cpu_prepare(struct evlist *evlist, struct target *target) { - int err; + int err, fd, i; + int ncpus = 1, ntasks = 1; if (off_cpu_config(evlist) < 0) { pr_err("Failed to config off-cpu BPF event\n"); return -1; } - set_max_rlimit(); - - skel = off_cpu_bpf__open_and_load(); + skel = off_cpu_bpf__open(); if (!skel) { pr_err("Failed to open off-cpu skeleton\n"); return -1; } + /* don't need to set cpu filter for system-wide mode */ + if (target->cpu_list) { + ncpus = perf_cpu_map__nr(evlist->core.user_requested_cpus); + bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus); + } + + if (target__has_task(target)) { + ncpus = perf_thread_map__nr(evlist->core.threads); + bpf_map__set_max_entries(skel->maps.task_filter, ntasks); + } + + set_max_rlimit(); + + err = off_cpu_bpf__load(skel); + if (err) { + pr_err("Failed to load off-cpu skeleton\n"); + goto out; + } + + if (target->cpu_list) { + u32 cpu; + u8 val = 1; + + skel->bss->has_cpu = 1; + fd = bpf_map__fd(skel->maps.cpu_filter); + + for (i = 0; i < ncpus; i++) { + cpu = perf_cpu_map__cpu(evlist->core.user_requested_cpus, i).cpu; + bpf_map_update_elem(fd, &cpu, &val, BPF_ANY); + } + } + + if (target__has_task(target)) { + u32 pid; + u8 val = 1; + + skel->bss->has_task = 1; + fd = bpf_map__fd(skel->maps.task_filter); + + for (i = 0; i < ntasks; i++) { + pid = perf_thread_map__pid(evlist->core.threads, i); + bpf_map_update_elem(fd, &pid, &val, BPF_ANY); + } + } + err = off_cpu_bpf__attach(skel); if (err) { pr_err("Failed to attach off-cpu skeleton\n"); goto out; } - if (perf_hooks__set_hook("record_start", off_cpu_start, NULL) || - perf_hooks__set_hook("record_end", off_cpu_finish, NULL)) { + if (perf_hooks__set_hook("record_start", off_cpu_start, evlist) || + perf_hooks__set_hook("record_end", off_cpu_finish, evlist)) { pr_err("Failed to attach off-cpu skeleton\n"); goto out; } diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c index 2bc6f7cc59ea..27425fe361e2 100644 --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -49,12 +49,28 @@ struct { __uint(max_entries, MAX_ENTRIES); } off_cpu SEC(".maps"); +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} cpu_filter SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} task_filter SEC(".maps"); + /* old kernel task_struct definition */ struct task_struct___old { long state; } __attribute__((preserve_access_index)); int enabled = 0; +int has_cpu = 0; +int has_task = 0; /* * recently task_struct->state renamed to __state so it made an incompatible @@ -74,6 +90,37 @@ static inline int get_task_state(struct task_struct *t) return BPF_CORE_READ(t_old, state); } +static inline int can_record(struct task_struct *t, int state) +{ + if (has_cpu) { + __u32 cpu = bpf_get_smp_processor_id(); + __u8 *ok; + + ok = bpf_map_lookup_elem(&cpu_filter, &cpu); + if (!ok) + return 0; + } + + if (has_task) { + __u8 *ok; + __u32 pid = t->pid; + + ok = bpf_map_lookup_elem(&task_filter, &pid); + if (!ok) + return 0; + } + + /* kernel threads don't have user stack */ + if (t->flags & PF_KTHREAD) + return 0; + + if (state != TASK_INTERRUPTIBLE && + state != TASK_UNINTERRUPTIBLE) + return 0; + + return 1; +} + SEC("tp_btf/sched_switch") int on_switch(u64 *ctx) { @@ -92,10 +139,7 @@ int on_switch(u64 *ctx) ts = bpf_ktime_get_ns(); - if (prev->flags & PF_KTHREAD) - goto next; - if (state != TASK_INTERRUPTIBLE && - state != TASK_UNINTERRUPTIBLE) + if (!can_record(prev, state)) goto next; stack_id = bpf_get_stackid(ctx, &stacks, From patchwork Fri Apr 22 05:34:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12822750 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A2C9AC433EF for ; Fri, 22 Apr 2022 05:37:10 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230475AbiDVFjz (ORCPT ); Fri, 22 Apr 2022 01:39:55 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1444173AbiDVFhC (ORCPT ); Fri, 22 Apr 2022 01:37:02 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 617384F445; Thu, 21 Apr 2022 22:34:10 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id t12so8501183pll.7; Thu, 21 Apr 2022 22:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=G5bMd3EZKPtSk/LlVpFY7y+BRjO9pioZ19YkfHn+mvI=; b=WBLAbPZKUQi9ZgV2hCFU/S/aOyaHcoxwMWeYgoL9BLhrNE2qGrPb69TGpB8BP3Le5Z CT8/HKaERwuz/5nNo59SY2TOpDTeA6J/Gpbx2Lc2EWLXzoduGLEHQp5QIFWtwqswioBC lNuwJvBoypAl08VAkDhJ8mBLF/+RHoKZBpgxRTFvcqs73F2Kt+QsNb40FuhNsOTDEbVg kO7O7MBTYtWtlXwIzuDBvY3z7OTS/0JCXPmPvBMYcRQPsXPMZs1yHV0rxA+XNlCdBLsA 2+4F1Wz7dbvbkV9OkiNzgJQW+h8WKYOTV8l/SO+1IbvAXdJooMcS8qj6drdi70EM9zmp RDBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=G5bMd3EZKPtSk/LlVpFY7y+BRjO9pioZ19YkfHn+mvI=; b=4DIcfVHR4AL3m7i3WcuZvQjMZ3MemqPX/XEFWZ4uTRaXaIDW/csMlmSU2oGhqt00yJ Mnd9Hil0u4Vg3dzBkVL8AeDQfmZ9GeokZZiu/rUe7CWb/Dj7pr0t1QoVbCHIw5BRr4ht wN6v8PS2DTipz2eZc/t5IzWdrxujamdHMo84aS5nYBVEX9PuCZ+7wwefZdiNATH9FLeb Iu0OvskF8F88fPAk9P0X20Xc8skX2xNDdmLAJMiu3STeHzP2sMMk76Was9i1ldDimR+5 S4idhJa+xGhS+I+P3fIk0/Pd88CEmvuja09TzCoH6FicBvIqUoBn08k8wMCV8jj9G7Nv Qevg== X-Gm-Message-State: AOAM532dc5tjIb6zyn/d2kEgH48ik27yImU8lV30Ed4+UaQV0D7fJFc2 wQHGTA1vrGS1Zbom5r1H4NM= X-Google-Smtp-Source: ABdhPJwLc4FzlEoCuD5URWbioHefbgZfrShmSLUhNUgKFAik7IRzv1TbY3Yjd5ryQ4rB6Oo4zgnUFQ== X-Received: by 2002:a17:90b:4b05:b0:1d2:3d1e:fbfb with SMTP id lx5-20020a17090b4b0500b001d23d1efbfbmr14430682pjb.33.1650605649848; Thu, 21 Apr 2022 22:34:09 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:32e3:a023:46c1:80cd]) by smtp.gmail.com with ESMTPSA id 204-20020a6302d5000000b00385f29b02b2sm886519pgc.50.2022.04.21.22.34.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Apr 2022 22:34:09 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 4/4] perf record: Handle argument change in sched_switch Date: Thu, 21 Apr 2022 22:34:01 -0700 Message-Id: <20220422053401.208207-5-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.rc2.479.g8af0fa9b8e-goog In-Reply-To: <20220422053401.208207-1-namhyung@kernel.org> References: <20220422053401.208207-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Recently sched_switch tracepoint added a new argument for prev_state, but it's hard to handle the change in a BPF program. Instead, we can check the function prototype in BTF before loading the program. Thus I make two copies of the tracepoint handler and select one based on the BTF info. Signed-off-by: Namhyung Kim --- tools/perf/util/bpf_off_cpu.c | 32 +++++++++++++++ tools/perf/util/bpf_skel/off_cpu.bpf.c | 55 ++++++++++++++++++++------ 2 files changed, 76 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c index 89f36229041d..38aeb13d3d25 100644 --- a/tools/perf/util/bpf_off_cpu.c +++ b/tools/perf/util/bpf_off_cpu.c @@ -86,6 +86,37 @@ static void off_cpu_finish(void *arg __maybe_unused) off_cpu_bpf__destroy(skel); } +/* recent kernel added prev_state arg, so it needs to call the proper function */ +static void check_sched_switch_args(void) +{ + const struct btf *btf = bpf_object__btf(skel->obj); + const struct btf_type *t1, *t2, *t3; + u32 type_id; + + type_id = btf__find_by_name_kind(btf, "bpf_trace_sched_switch", + BTF_KIND_TYPEDEF); + if ((s32)type_id < 0) + goto old_format; + + t1 = btf__type_by_id(btf, type_id); + if (t1 == NULL) + goto old_format; + + t2 = btf__type_by_id(btf, t1->type); + if (t2 == NULL || !btf_is_ptr(t2)) + goto old_format; + + t3 = btf__type_by_id(btf, t2->type); + if (t3 && btf_is_func_proto(t3) && btf_vlen(t3) == 4) { + /* new format: disable old functions */ + bpf_program__set_autoload(skel->progs.on_switch3, false); + return; + } + +old_format: + bpf_program__set_autoload(skel->progs.on_switch4, false); +} + int off_cpu_prepare(struct evlist *evlist, struct target *target) { int err, fd, i; @@ -114,6 +145,7 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target) } set_max_rlimit(); + check_sched_switch_args(); err = off_cpu_bpf__load(skel); if (err) { diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c index 27425fe361e2..e11e198af86f 100644 --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -121,22 +121,13 @@ static inline int can_record(struct task_struct *t, int state) return 1; } -SEC("tp_btf/sched_switch") -int on_switch(u64 *ctx) +static int on_switch(u64 *ctx, struct task_struct *prev, + struct task_struct *next, int state) { __u64 ts; - int state; __u32 pid, stack_id; - struct task_struct *prev, *next; struct tstamp_data elem, *pelem; - if (!enabled) - return 0; - - prev = (struct task_struct *)ctx[1]; - next = (struct task_struct *)ctx[2]; - state = get_task_state(prev); - ts = bpf_ktime_get_ns(); if (!can_record(prev, state)) @@ -178,4 +169,46 @@ int on_switch(u64 *ctx) return 0; } +SEC("tp_btf/sched_switch") +int on_switch3(u64 *ctx) +{ + struct task_struct *prev, *next; + int state; + + if (!enabled) + return 0; + + /* + * TP_PROTO(bool preempt, struct task_struct *prev, + * struct task_struct *next) + */ + prev = (struct task_struct *)ctx[1]; + next = (struct task_struct *)ctx[2]; + + state = get_task_state(prev); + + return on_switch(ctx, prev, next, state); +} + +SEC("tp_btf/sched_switch") +int on_switch4(u64 *ctx) +{ + struct task_struct *prev, *next; + int prev_state; + + if (!enabled) + return 0; + + /* + * TP_PROTO(bool preempt, int prev_state, + * struct task_struct *prev, + * struct task_struct *next) + */ + prev = (struct task_struct *)ctx[2]; + next = (struct task_struct *)ctx[3]; + prev_state = (int)ctx[1]; + + return on_switch(ctx, prev, next, prev_state); +} + char LICENSE[] SEC("license") = "Dual BSD/GPL";