From patchwork Fri May 6 20:16:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12841573 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5CC9C4332F for ; Fri, 6 May 2022 20:16:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442174AbiEFUUS (ORCPT ); Fri, 6 May 2022 16:20:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442119AbiEFUUR (ORCPT ); Fri, 6 May 2022 16:20:17 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 445FB6623A; Fri, 6 May 2022 13:16:32 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id c11so8456877plg.13; Fri, 06 May 2022 13:16:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xVPHxCseNXWYBUFBOD07R/7B+vT2nTqbIkg8OWHEgxY=; b=k3BfgdGXLvlI836C2542lBsuso+gNHop0YfdQoGdBSyqb5lWr1eiymoZDYsH1028CB 42dsRyerDehiuFukTV3jL0QWwhWQJ6ez0QrcxGjyB6oBhOikjRUBASLg5ol5S6/cNF9X lGNUyyDdN8ZYohNytOA9DLqHsInm0FvdhaR58uD45nqxyKWH2oknRHYIy6glYxRdY8ap Wp6EPbejaNZC4uEMArbZf8TNgu+/VNR9Uy5044CC9Ino5NjIIY9yXaXTrwGHIEfouEvn qYtYknRT3OX0U32QXS86wgQFnwwB1tmF1LDg7e0dB23GfUTPPAZrUomRNocb8eXAJcDD ZM6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=xVPHxCseNXWYBUFBOD07R/7B+vT2nTqbIkg8OWHEgxY=; b=5HwYAxIExevki8WNH54nCk5VvbmDtALBknHUD6HXFtlp28UihixX09JfdQze8xtgkM QbOJA2yQbsiEXpMZ3JN2zikWKMyDTQRZzUgBbX/ByopjBquINZD4zkcjElCacNiu5Mq3 kssukMgOSBmQqLuGLL0GuPzxUK9fBiDl1naBjaIfEn3s0tV3F3WvLYxh5XWK3mNO2MT7 qBiZDXyk8io+yeLCpe/0zMRlwCpp4Gmc19v4XxtwUfYoJEe8/SqlX8dz6YCQHTbr8s/j BmwL5AilXq3pW/5+4/6TIZBsWQN10T1wrU3Fz82MoXniY9p4+gTAjptI0+k0Xr2qmnBl OOHg== X-Gm-Message-State: AOAM533X9fV0+Nxw/nLnbtAf7WBKU/qPLfCu0qoYIm1G1ZlDQO49bwBa i1IMODckSpqVfIYKeJbVQzo= X-Google-Smtp-Source: ABdhPJzC+86nnQZIwXKlv4YSRSEYK5NJoTg55E8ZuuucCCfZ1l8HnbpmcFRPNaaq9BVwmf5rkbAzVQ== X-Received: by 2002:a17:902:c7d3:b0:15e:b2f1:15f4 with SMTP id r19-20020a170902c7d300b0015eb2f115f4mr5448403pla.39.1651868191795; Fri, 06 May 2022 13:16:31 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:a5d1:d7b7:dd61:c87b]) by smtp.gmail.com with ESMTPSA id m2-20020a170902db8200b0015e8d4eb268sm2160156pld.178.2022.05.06.13.16.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 May 2022 13:16:31 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , Milian Wolff , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 1/4] perf report: Do not extend sample type of bpf-output event Date: Fri, 6 May 2022 13:16:24 -0700 Message-Id: <20220506201627.85598-2-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.512.ge40c2bad7a-goog In-Reply-To: <20220506201627.85598-1-namhyung@kernel.org> References: <20220506201627.85598-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Currently evsel__new_idx() sets more sample_type bits when it finds a BPF-output event. But it should honor what's recorded in the perf data file rather than blindly sets the bits. Otherwise it could lead to a parse error when it recorded with a modified sample_type. Signed-off-by: Namhyung Kim --- tools/perf/util/evsel.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index d38722560e80..63a8b832b822 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -269,8 +269,8 @@ struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx) return NULL; evsel__init(evsel, attr, idx); - if (evsel__is_bpf_output(evsel)) { - evsel->core.attr.sample_type |= (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | + if (evsel__is_bpf_output(evsel) && !attr->sample_type) { + evsel->core.attr.sample_type = (PERF_SAMPLE_RAW | PERF_SAMPLE_TIME | PERF_SAMPLE_CPU | PERF_SAMPLE_PERIOD), evsel->core.attr.sample_period = 1; } From patchwork Fri May 6 20:16:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12841574 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DBE6DC433F5 for ; Fri, 6 May 2022 20:16:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442209AbiEFUUU (ORCPT ); Fri, 6 May 2022 16:20:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442200AbiEFUUT (ORCPT ); Fri, 6 May 2022 16:20:19 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6E45264725; Fri, 6 May 2022 13:16:34 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id i24so7120036pfa.7; Fri, 06 May 2022 13:16:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=j5e+0/NGORUxhIij6Y0NpEwSlQRm9Kajl3i8RNysU1Y=; b=AbiDD3LH8Oqrdyx674DW7FA05GS704JhKH1cEU8Hgn0OgbkgzeeG0BB9vKavde+nSB NTd+MWI5/Dihl1+VtlwqB/vpm1mLojYkCFwwN/KOvn3u4r0o34q4mjr2RMbL8x0pK6X+ c09TykfQtX/aGh5fzMKonZPTgRsusClDMnBFhe4ZVMRCJsdHw0qHF3xg+ZiIYRyWJaSX 7XSTwj/aR0u+1HJkDKsX2oHsMKlAeUB4JsFcjtBYN68pr0fK0hQW2nBRLCnMKf51zhE3 ZFIC18OPU3D7B9U19RoV7bdI6CASrIuKJ4zx2dYiX8RN4i1ZbpUsqMewjc0qKFp0HBym WTsg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=j5e+0/NGORUxhIij6Y0NpEwSlQRm9Kajl3i8RNysU1Y=; b=8SLICBt0kX3Ci5jxGuxzY9P9sqqf+Q+tilsVSgLq87UQaGOsj88CC2fSOuvvIcJwZ+ qExthVib2ISa3Qm9jj2+7t+2Fsp6ysejt4VI73Uz29hAOUTpKZ17HB9KbiAnANtyZ7d6 +W2K/+VcqtCDdiZDD6950lEy6CPb8kRJWdglmFL0LrvY/lEaRfCWXHbpFqW2/eLZZwCT kx8aBO6bhWX+ZeOqZi79Qm0g4Mw9de4La5Ta/YdqrE2OQjSrQuRH7waSC4yGrC+JRScK gXXyZ25lzxJFHDjNAhLwXYBaBKXXwiwh7axAjfmEFxELCViVd05hNkUZOYeuJu3irDHu UomQ== X-Gm-Message-State: AOAM532SxRmNZ9EMftv2cyRamexSjcVmPdTrFp/Qdtn4rBC67aQdaGLq jAFSUzGUkToXE06nXWNSK7E= X-Google-Smtp-Source: ABdhPJwNy7lGLJlcjqfa9E6MO7W+Q4FkVbVelg+xS6nntduyYOnwZh1t+wcrtDuF62WZNQDlMVSrjA== X-Received: by 2002:a63:384e:0:b0:374:ae28:71fc with SMTP id h14-20020a63384e000000b00374ae2871fcmr4094137pgn.159.1651868193256; Fri, 06 May 2022 13:16:33 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:a5d1:d7b7:dd61:c87b]) by smtp.gmail.com with ESMTPSA id m2-20020a170902db8200b0015e8d4eb268sm2160156pld.178.2022.05.06.13.16.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 May 2022 13:16:32 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , Milian Wolff , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 2/4] perf record: Enable off-cpu analysis with BPF Date: Fri, 6 May 2022 13:16:25 -0700 Message-Id: <20220506201627.85598-3-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.512.ge40c2bad7a-goog In-Reply-To: <20220506201627.85598-1-namhyung@kernel.org> References: <20220506201627.85598-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Add --off-cpu option to enable the off-cpu profiling with BPF. It'd use a bpf_output event and rename it to "offcpu-time". Samples will be synthesized at the end of the record session using data from a BPF map which contains the aggregated off-cpu time at context switches. So it needs root privilege to get the off-cpu profiling. Each sample will have a separate user stacktrace so it will skip kernel threads. The sample ip will be set from the stacktrace and other sample data will be updated accordingly. Currently it only handles some basic sample types. The sample timestamp is set to a dummy value just not to bother with other events during the sorting. So it has a very big initial value and increase it on processing each samples. Good thing is that it can be used together with regular profiling like cpu cycles. If you don't want to that, you can use a dummy event to enable off-cpu profiling only. Example output: $ sudo perf record --off-cpu perf bench sched messaging -l 1000 $ sudo perf report --stdio --call-graph=no # Total Lost Samples: 0 # # Samples: 41K of event 'cycles' # Event count (approx.): 42137343851 ... # Samples: 1K of event 'offcpu-time' # Event count (approx.): 587990831640 # # Children Self Command Shared Object Symbol # ........ ........ ............... .................. ......................... # 81.66% 0.00% sched-messaging libc-2.33.so [.] __libc_start_main 81.66% 0.00% sched-messaging perf [.] cmd_bench 81.66% 0.00% sched-messaging perf [.] main 81.66% 0.00% sched-messaging perf [.] run_builtin 81.43% 0.00% sched-messaging perf [.] bench_sched_messaging 40.86% 40.86% sched-messaging libpthread-2.33.so [.] __read 37.66% 37.66% sched-messaging libpthread-2.33.so [.] __write 2.91% 2.91% sched-messaging libc-2.33.so [.] __poll ... As you can see it spent most of off-cpu time in read and write in bench_sched_messaging(). The --call-graph=no was added just to make the output concise here. It uses perf hooks facility to control BPF program during the record session rather than adding new BPF/off-cpu specific calls. Signed-off-by: Namhyung Kim --- tools/perf/Documentation/perf-record.txt | 10 ++ tools/perf/Makefile.perf | 1 + tools/perf/builtin-record.c | 21 +++ tools/perf/util/Build | 1 + tools/perf/util/bpf_off_cpu.c | 208 +++++++++++++++++++++++ tools/perf/util/bpf_skel/off_cpu.bpf.c | 139 +++++++++++++++ tools/perf/util/off_cpu.h | 22 +++ 7 files changed, 402 insertions(+) create mode 100644 tools/perf/util/bpf_off_cpu.c create mode 100644 tools/perf/util/bpf_skel/off_cpu.bpf.c create mode 100644 tools/perf/util/off_cpu.h diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 465be4e62a17..8084e3fb73a1 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -758,6 +758,16 @@ include::intel-hybrid.txt[] If the URLs is not specified, the value of DEBUGINFOD_URLS system environment variable is used. +--off-cpu:: + Enable off-cpu profiling with BPF. The BPF program will collect + task scheduling information with (user) stacktrace and save them + as sample data of a software event named "offcpu-time". The + sample period will have the time the task slept in nano-second. + + Note that BPF can collect stacktrace using frame pointer ("fp") + only, as of now. So the applications built without the frame + pointer might see bogus addresses. + SEE ALSO -------- linkperf:perf-stat[1], linkperf:perf-list[1], linkperf:perf-intel-pt[1] diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf index 6e5aded855cc..8f738e11356d 100644 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@ -1038,6 +1038,7 @@ SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp) SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h SKELETONS += $(SKEL_OUT)/bperf_leader.skel.h $(SKEL_OUT)/bperf_follower.skel.h SKELETONS += $(SKEL_OUT)/bperf_cgroup.skel.h $(SKEL_OUT)/func_latency.skel.h +SKELETONS += $(SKEL_OUT)/off_cpu.skel.h $(SKEL_TMP_OUT) $(LIBBPF_OUTPUT): $(Q)$(MKDIR) -p $@ diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index a5cf6a99d67f..4e0285ca9a2d 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -49,6 +49,7 @@ #include "util/clockid.h" #include "util/pmu-hybrid.h" #include "util/evlist-hybrid.h" +#include "util/off_cpu.h" #include "asm/bug.h" #include "perf.h" #include "cputopo.h" @@ -162,6 +163,7 @@ struct record { bool buildid_mmap; bool timestamp_filename; bool timestamp_boundary; + bool off_cpu; struct switch_output switch_output; unsigned long long samples; unsigned long output_max_size; /* = 0: unlimited */ @@ -903,6 +905,11 @@ static int record__config_text_poke(struct evlist *evlist) return 0; } +static int record__config_off_cpu(struct record *rec) +{ + return off_cpu_prepare(rec->evlist); +} + static bool record__kcore_readable(struct machine *machine) { char kcore[PATH_MAX]; @@ -2600,6 +2607,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv) } else status = err; + if (rec->off_cpu) + rec->bytes_written += off_cpu_write(rec->session); + record__synthesize(rec, true); /* this will be recalculated during process_buildids() */ rec->samples = 0; @@ -3324,6 +3334,9 @@ static struct option __record_options[] = { OPT_CALLBACK_OPTARG(0, "threads", &record.opts, NULL, "spec", "write collected trace data into several data files using parallel threads", record__parse_threads), +#ifdef HAVE_BPF_SKEL + OPT_BOOLEAN(0, "off-cpu", &record.off_cpu, "Enable off-cpu analysis"), +#endif OPT_END() }; @@ -3981,6 +3994,14 @@ int cmd_record(int argc, const char **argv) } } + if (rec->off_cpu) { + err = record__config_off_cpu(rec); + if (err) { + pr_err("record__config_off_cpu failed, error %d\n", err); + goto out; + } + } + if (record_opts__config(&rec->opts)) { err = -EINVAL; goto out; diff --git a/tools/perf/util/Build b/tools/perf/util/Build index 9a7209a99e16..a51267d88ca9 100644 --- a/tools/perf/util/Build +++ b/tools/perf/util/Build @@ -147,6 +147,7 @@ perf-$(CONFIG_LIBBPF) += bpf_map.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_counter_cgroup.o perf-$(CONFIG_PERF_BPF_SKEL) += bpf_ftrace.o +perf-$(CONFIG_PERF_BPF_SKEL) += bpf_off_cpu.o perf-$(CONFIG_BPF_PROLOGUE) += bpf-prologue.o perf-$(CONFIG_LIBELF) += symbol-elf.o perf-$(CONFIG_LIBELF) += probe-file.o diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c new file mode 100644 index 000000000000..1f87d2a9b86d --- /dev/null +++ b/tools/perf/util/bpf_off_cpu.c @@ -0,0 +1,208 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "util/bpf_counter.h" +#include "util/debug.h" +#include "util/evsel.h" +#include "util/evlist.h" +#include "util/off_cpu.h" +#include "util/perf-hooks.h" +#include "util/session.h" +#include + +#include "bpf_skel/off_cpu.skel.h" + +#define MAX_STACKS 32 +/* we don't need actual timestamp, just want to put the samples at last */ +#define OFF_CPU_TIMESTAMP (~0ull << 32) + +static struct off_cpu_bpf *skel; + +struct off_cpu_key { + u32 pid; + u32 tgid; + u32 stack_id; + u32 state; + u64 cgroup_id; +}; + +union off_cpu_data { + struct perf_event_header hdr; + u64 array[1024 / sizeof(u64)]; +}; + +static int off_cpu_config(struct evlist *evlist) +{ + struct evsel *evsel; + struct perf_event_attr attr = { + .type = PERF_TYPE_SOFTWARE, + .config = PERF_COUNT_SW_BPF_OUTPUT, + .size = sizeof(attr), /* to capture ABI version */ + }; + + evsel = evsel__new(&attr); + if (!evsel) + return -ENOMEM; + + evsel->core.attr.freq = 1; + evsel->core.attr.sample_period = 1; + /* off-cpu analysis depends on stack trace */ + evsel->core.attr.sample_type = PERF_SAMPLE_CALLCHAIN; + + evlist__add(evlist, evsel); + + free(evsel->name); + evsel->name = strdup("offcpu-time"); + if (evsel->name == NULL) + return -ENOMEM; + + return 0; +} + +static void off_cpu_start(void *arg __maybe_unused) +{ + skel->bss->enabled = 1; +} + +static void off_cpu_finish(void *arg __maybe_unused) +{ + skel->bss->enabled = 0; + off_cpu_bpf__destroy(skel); +} + +int off_cpu_prepare(struct evlist *evlist) +{ + int err; + + if (off_cpu_config(evlist) < 0) { + pr_err("Failed to config off-cpu BPF event\n"); + return -1; + } + + set_max_rlimit(); + + skel = off_cpu_bpf__open_and_load(); + if (!skel) { + pr_err("Failed to open off-cpu skeleton\n"); + return -1; + } + + err = off_cpu_bpf__attach(skel); + if (err) { + pr_err("Failed to attach off-cpu skeleton\n"); + goto out; + } + + if (perf_hooks__set_hook("record_start", off_cpu_start, NULL) || + perf_hooks__set_hook("record_end", off_cpu_finish, NULL)) { + pr_err("Failed to attach off-cpu skeleton\n"); + goto out; + } + + return 0; + +out: + off_cpu_bpf__destroy(skel); + return -1; +} + +int off_cpu_write(struct perf_session *session) +{ + int bytes = 0, size; + int fd, stack; + bool found = false; + u64 sample_type, val, sid = 0; + struct evsel *evsel; + struct perf_data_file *file = &session->data->file; + struct off_cpu_key prev, key; + union off_cpu_data data = { + .hdr = { + .type = PERF_RECORD_SAMPLE, + .misc = PERF_RECORD_MISC_USER, + }, + }; + u64 tstamp = OFF_CPU_TIMESTAMP; + + skel->bss->enabled = 0; + + evlist__for_each_entry(session->evlist, evsel) { + if (!strcmp(evsel__name(evsel), "offcpu-time")) { + found = true; + break; + } + } + + if (!found) { + pr_err("offcpu-time evsel not found\n"); + return 0; + } + + sample_type = evsel->core.attr.sample_type; + + if (sample_type & (PERF_SAMPLE_ID | PERF_SAMPLE_IDENTIFIER)) { + if (evsel->core.id) + sid = evsel->core.id[0]; + } + + fd = bpf_map__fd(skel->maps.off_cpu); + stack = bpf_map__fd(skel->maps.stacks); + memset(&prev, 0, sizeof(prev)); + + while (!bpf_map_get_next_key(fd, &prev, &key)) { + int n = 1; /* start from perf_event_header */ + int ip_pos = -1; + + bpf_map_lookup_elem(fd, &key, &val); + + if (sample_type & PERF_SAMPLE_IDENTIFIER) + data.array[n++] = sid; + if (sample_type & PERF_SAMPLE_IP) { + ip_pos = n; + data.array[n++] = 0; /* will be updated */ + } + if (sample_type & PERF_SAMPLE_TID) + data.array[n++] = (u64)key.pid << 32 | key.tgid; + if (sample_type & PERF_SAMPLE_TIME) + data.array[n++] = tstamp; + if (sample_type & PERF_SAMPLE_ID) + data.array[n++] = sid; + if (sample_type & PERF_SAMPLE_CPU) + data.array[n++] = 0; + if (sample_type & PERF_SAMPLE_PERIOD) + data.array[n++] = val; + if (sample_type & PERF_SAMPLE_CALLCHAIN) { + int len = 0; + + /* data.array[n] is callchain->nr (updated later) */ + data.array[n + 1] = PERF_CONTEXT_USER; + data.array[n + 2] = 0; + + bpf_map_lookup_elem(stack, &key.stack_id, &data.array[n + 2]); + while (data.array[n + 2 + len]) + len++; + + /* update length of callchain */ + data.array[n] = len + 1; + + /* update sample ip with the first callchain entry */ + if (ip_pos >= 0) + data.array[ip_pos] = data.array[n + 2]; + + /* calculate sample callchain data array length */ + n += len + 2; + } + /* TODO: handle more sample types */ + + size = n * sizeof(u64); + data.hdr.size = size; + bytes += size; + + if (perf_data_file__write(file, &data, size) < 0) { + pr_err("failed to write perf data, error: %m\n"); + return bytes; + } + + prev = key; + /* increase dummy timestamp to sort later samples */ + tstamp++; + } + return bytes; +} diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c new file mode 100644 index 000000000000..5173ed882fdf --- /dev/null +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) +// Copyright (c) 2022 Google +#include "vmlinux.h" +#include +#include +#include + +/* task->flags for off-cpu analysis */ +#define PF_KTHREAD 0x00200000 /* I am a kernel thread */ + +/* task->state for off-cpu analysis */ +#define TASK_INTERRUPTIBLE 0x0001 +#define TASK_UNINTERRUPTIBLE 0x0002 + +#define MAX_STACKS 32 +#define MAX_ENTRIES 102400 + +struct tstamp_data { + __u32 stack_id; + __u32 state; + __u64 timestamp; +}; + +struct offcpu_key { + __u32 pid; + __u32 tgid; + __u32 stack_id; + __u32 state; +}; + +struct { + __uint(type, BPF_MAP_TYPE_STACK_TRACE); + __uint(key_size, sizeof(__u32)); + __uint(value_size, MAX_STACKS * sizeof(__u64)); + __uint(max_entries, MAX_ENTRIES); +} stacks SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_TASK_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct tstamp_data); +} tstamp SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(struct offcpu_key)); + __uint(value_size, sizeof(__u64)); + __uint(max_entries, MAX_ENTRIES); +} off_cpu SEC(".maps"); + +/* old kernel task_struct definition */ +struct task_struct___old { + long state; +} __attribute__((preserve_access_index)); + +int enabled = 0; + +/* + * Old kernel used to call it task_struct->state and now it's '__state'. + * Use BPF CO-RE "ignored suffix rule" to deal with it like below: + * + * https://nakryiko.com/posts/bpf-core-reference-guide/#handling-incompatible-field-and-type-changes + */ +static inline int get_task_state(struct task_struct *t) +{ + if (bpf_core_field_exists(t->__state)) + return BPF_CORE_READ(t, __state); + + /* recast pointer to capture task_struct___old type for compiler */ + struct task_struct___old *t_old = (void *)t; + + /* now use old "state" name of the field */ + return BPF_CORE_READ(t_old, state); +} + +SEC("tp_btf/sched_switch") +int on_switch(u64 *ctx) +{ + __u64 ts; + int state; + __u32 stack_id; + struct task_struct *prev, *next; + struct tstamp_data *pelem; + + if (!enabled) + return 0; + + prev = (struct task_struct *)ctx[1]; + next = (struct task_struct *)ctx[2]; + state = get_task_state(prev); + + ts = bpf_ktime_get_ns(); + + if (prev->flags & PF_KTHREAD) + goto next; + if (state != TASK_INTERRUPTIBLE && + state != TASK_UNINTERRUPTIBLE) + goto next; + + stack_id = bpf_get_stackid(ctx, &stacks, + BPF_F_FAST_STACK_CMP | BPF_F_USER_STACK); + + pelem = bpf_task_storage_get(&tstamp, prev, NULL, + BPF_LOCAL_STORAGE_GET_F_CREATE); + if (!pelem) + goto next; + + pelem->timestamp = ts; + pelem->state = state; + pelem->stack_id = stack_id; + +next: + pelem = bpf_task_storage_get(&tstamp, next, NULL, 0); + + if (pelem && pelem->timestamp) { + struct offcpu_key key = { + .pid = next->pid, + .tgid = next->tgid, + .stack_id = pelem->stack_id, + .state = pelem->state, + }; + __u64 delta = ts - pelem->timestamp; + __u64 *total; + + total = bpf_map_lookup_elem(&off_cpu, &key); + if (total) + *total += delta; + else + bpf_map_update_elem(&off_cpu, &key, &delta, BPF_ANY); + + /* prevent to reuse the timestamp later */ + pelem->timestamp = 0; + } + + return 0; +} + +char LICENSE[] SEC("license") = "Dual BSD/GPL"; diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h new file mode 100644 index 000000000000..4113bd19a2e4 --- /dev/null +++ b/tools/perf/util/off_cpu.h @@ -0,0 +1,22 @@ +#ifndef PERF_UTIL_OFF_CPU_H +#define PERF_UTIL_OFF_CPU_H + +struct evlist; +struct perf_session; + +#ifdef HAVE_BPF_SKEL +int off_cpu_prepare(struct evlist *evlist); +int off_cpu_write(struct perf_session *session); +#else +static inline int off_cpu_prepare(struct evlist *evlist __maybe_unused) +{ + return -1; +} + +static inline int off_cpu_write(struct perf_session *session __maybe_unused) +{ + return -1; +} +#endif + +#endif /* PERF_UTIL_OFF_CPU_H */ From patchwork Fri May 6 20:16:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12841575 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04084C433FE for ; Fri, 6 May 2022 20:16:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442345AbiEFUUV (ORCPT ); Fri, 6 May 2022 16:20:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47554 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442229AbiEFUUU (ORCPT ); Fri, 6 May 2022 16:20:20 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 755D16623A; Fri, 6 May 2022 13:16:35 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id e24so7959714pjt.2; Fri, 06 May 2022 13:16:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=2EyyhftSoRxaWVuV8cJi4YZ+Ok2YzcnoV0luuIHn0S0=; b=oGj4iJ9vqFHoG1R4Pp0EWdZAtUkW3VqrbTKKuGLuzZ86cKqwOdQO/I8mDN0vkqn67U vkJviyHizuZ1blNP5dVezLmhkujFdBBvWJGWfKorJD7/vRUuENqspWPWFT/dfenZxvAk uLi2XfhOtdA0bOrqYqbUHkvjHtXgpgV7udJMotGnC/tS2/waQETB4vSmKctcPdeNImyW h6TTzad1xx6CQCAmmeYPTM+YIEJpVJRNsEsCoYPPorO1RVTjUEUih2wgC2lcerRrrSMA zL50j4ZSjB9CmW09WBiGePdHlBleqtXz1VW9nTQkFN4EcVHmYHcXASfNGlhKoK8huYrD z6gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=2EyyhftSoRxaWVuV8cJi4YZ+Ok2YzcnoV0luuIHn0S0=; b=BwDGwNRbgSDvmoCyB1TFsr2YiDwPilZyVyXyOZ9ZMG0DW+cbue//cZjwwWU3DT2AMf MGBj2ZYwkcZUbMccOX4jqPF6r/1C3SgIoQw81bpqp1nkWWoW4Bw3JeTcbOF0gC/sShd+ fBQ3q+8XDI3z6sCHyupHehaUTJ7V7FRy8xeaneUa6sbnf3n4zkEfvlzgJjQD4hrU60+I 8aTU0J4QyPj8MKRmcKsiWAbfWy5IGd+OBFLxES+PQrdHr6NczfaSSfKunYL48GtWJIoK rA22xrVO8XaNjiAAD6Gc+DU0xJ7uhZrqFkGou3PhBNthWwqUvQVJfsGf/7wDbDNDKggh tAAQ== X-Gm-Message-State: AOAM531SwvzoPtUce6+NfzvzLFwH5HAKSUB/BcVFwoRN6AEwXHp0g1D+ M9ItsmNO6nI5MfZY7ZvKZyo= X-Google-Smtp-Source: ABdhPJwy3PjhjOhTR7pEpj5rSgVPOT4b4d68euOKBFUdZJse214qWoh4Xl5aqEf3EJxk/583tWJmOQ== X-Received: by 2002:a17:903:240a:b0:14e:dad4:5ce4 with SMTP id e10-20020a170903240a00b0014edad45ce4mr5443166plo.125.1651868194896; Fri, 06 May 2022 13:16:34 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:a5d1:d7b7:dd61:c87b]) by smtp.gmail.com with ESMTPSA id m2-20020a170902db8200b0015e8d4eb268sm2160156pld.178.2022.05.06.13.16.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 May 2022 13:16:34 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , Milian Wolff , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 3/4] perf record: Implement basic filtering for off-cpu Date: Fri, 6 May 2022 13:16:26 -0700 Message-Id: <20220506201627.85598-4-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.512.ge40c2bad7a-goog In-Reply-To: <20220506201627.85598-1-namhyung@kernel.org> References: <20220506201627.85598-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org It should honor cpu and task filtering with -a, -C or -p, -t options. Signed-off-by: Namhyung Kim --- tools/perf/builtin-record.c | 2 +- tools/perf/util/bpf_off_cpu.c | 78 +++++++++++++++++++++++--- tools/perf/util/bpf_skel/off_cpu.bpf.c | 52 +++++++++++++++-- tools/perf/util/off_cpu.h | 6 +- 4 files changed, 123 insertions(+), 15 deletions(-) diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 4e0285ca9a2d..c9b78e2b74bb 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -907,7 +907,7 @@ static int record__config_text_poke(struct evlist *evlist) static int record__config_off_cpu(struct record *rec) { - return off_cpu_prepare(rec->evlist); + return off_cpu_prepare(rec->evlist, &rec->opts.target); } static bool record__kcore_readable(struct machine *machine) diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c index 1f87d2a9b86d..89f36229041d 100644 --- a/tools/perf/util/bpf_off_cpu.c +++ b/tools/perf/util/bpf_off_cpu.c @@ -6,6 +6,9 @@ #include "util/off_cpu.h" #include "util/perf-hooks.h" #include "util/session.h" +#include "util/target.h" +#include "util/cpumap.h" +#include "util/thread_map.h" #include #include "bpf_skel/off_cpu.skel.h" @@ -57,8 +60,23 @@ static int off_cpu_config(struct evlist *evlist) return 0; } -static void off_cpu_start(void *arg __maybe_unused) +static void off_cpu_start(void *arg) { + struct evlist *evlist = arg; + + /* update task filter for the given workload */ + if (!skel->bss->has_cpu && !skel->bss->has_task && + perf_thread_map__pid(evlist->core.threads, 0) != -1) { + int fd; + u32 pid; + u8 val = 1; + + skel->bss->has_task = 1; + fd = bpf_map__fd(skel->maps.task_filter); + pid = perf_thread_map__pid(evlist->core.threads, 0); + bpf_map_update_elem(fd, &pid, &val, BPF_ANY); + } + skel->bss->enabled = 1; } @@ -68,31 +86,75 @@ static void off_cpu_finish(void *arg __maybe_unused) off_cpu_bpf__destroy(skel); } -int off_cpu_prepare(struct evlist *evlist) +int off_cpu_prepare(struct evlist *evlist, struct target *target) { - int err; + int err, fd, i; + int ncpus = 1, ntasks = 1; if (off_cpu_config(evlist) < 0) { pr_err("Failed to config off-cpu BPF event\n"); return -1; } - set_max_rlimit(); - - skel = off_cpu_bpf__open_and_load(); + skel = off_cpu_bpf__open(); if (!skel) { pr_err("Failed to open off-cpu skeleton\n"); return -1; } + /* don't need to set cpu filter for system-wide mode */ + if (target->cpu_list) { + ncpus = perf_cpu_map__nr(evlist->core.user_requested_cpus); + bpf_map__set_max_entries(skel->maps.cpu_filter, ncpus); + } + + if (target__has_task(target)) { + ncpus = perf_thread_map__nr(evlist->core.threads); + bpf_map__set_max_entries(skel->maps.task_filter, ntasks); + } + + set_max_rlimit(); + + err = off_cpu_bpf__load(skel); + if (err) { + pr_err("Failed to load off-cpu skeleton\n"); + goto out; + } + + if (target->cpu_list) { + u32 cpu; + u8 val = 1; + + skel->bss->has_cpu = 1; + fd = bpf_map__fd(skel->maps.cpu_filter); + + for (i = 0; i < ncpus; i++) { + cpu = perf_cpu_map__cpu(evlist->core.user_requested_cpus, i).cpu; + bpf_map_update_elem(fd, &cpu, &val, BPF_ANY); + } + } + + if (target__has_task(target)) { + u32 pid; + u8 val = 1; + + skel->bss->has_task = 1; + fd = bpf_map__fd(skel->maps.task_filter); + + for (i = 0; i < ntasks; i++) { + pid = perf_thread_map__pid(evlist->core.threads, i); + bpf_map_update_elem(fd, &pid, &val, BPF_ANY); + } + } + err = off_cpu_bpf__attach(skel); if (err) { pr_err("Failed to attach off-cpu skeleton\n"); goto out; } - if (perf_hooks__set_hook("record_start", off_cpu_start, NULL) || - perf_hooks__set_hook("record_end", off_cpu_finish, NULL)) { + if (perf_hooks__set_hook("record_start", off_cpu_start, evlist) || + perf_hooks__set_hook("record_end", off_cpu_finish, evlist)) { pr_err("Failed to attach off-cpu skeleton\n"); goto out; } diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c index 5173ed882fdf..c35106b9e20b 100644 --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -49,12 +49,28 @@ struct { __uint(max_entries, MAX_ENTRIES); } off_cpu SEC(".maps"); +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} cpu_filter SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(key_size, sizeof(__u32)); + __uint(value_size, sizeof(__u8)); + __uint(max_entries, 1); +} task_filter SEC(".maps"); + /* old kernel task_struct definition */ struct task_struct___old { long state; } __attribute__((preserve_access_index)); int enabled = 0; +int has_cpu = 0; +int has_task = 0; /* * Old kernel used to call it task_struct->state and now it's '__state'. @@ -74,6 +90,37 @@ static inline int get_task_state(struct task_struct *t) return BPF_CORE_READ(t_old, state); } +static inline int can_record(struct task_struct *t, int state) +{ + if (has_cpu) { + __u32 cpu = bpf_get_smp_processor_id(); + __u8 *ok; + + ok = bpf_map_lookup_elem(&cpu_filter, &cpu); + if (!ok) + return 0; + } + + if (has_task) { + __u8 *ok; + __u32 pid = t->pid; + + ok = bpf_map_lookup_elem(&task_filter, &pid); + if (!ok) + return 0; + } + + /* kernel threads don't have user stack */ + if (t->flags & PF_KTHREAD) + return 0; + + if (state != TASK_INTERRUPTIBLE && + state != TASK_UNINTERRUPTIBLE) + return 0; + + return 1; +} + SEC("tp_btf/sched_switch") int on_switch(u64 *ctx) { @@ -92,10 +139,7 @@ int on_switch(u64 *ctx) ts = bpf_ktime_get_ns(); - if (prev->flags & PF_KTHREAD) - goto next; - if (state != TASK_INTERRUPTIBLE && - state != TASK_UNINTERRUPTIBLE) + if (!can_record(prev, state)) goto next; stack_id = bpf_get_stackid(ctx, &stacks, diff --git a/tools/perf/util/off_cpu.h b/tools/perf/util/off_cpu.h index 4113bd19a2e4..864d6148dfd3 100644 --- a/tools/perf/util/off_cpu.h +++ b/tools/perf/util/off_cpu.h @@ -2,13 +2,15 @@ #define PERF_UTIL_OFF_CPU_H struct evlist; +struct target; struct perf_session; #ifdef HAVE_BPF_SKEL -int off_cpu_prepare(struct evlist *evlist); +int off_cpu_prepare(struct evlist *evlist, struct target *target); int off_cpu_write(struct perf_session *session); #else -static inline int off_cpu_prepare(struct evlist *evlist __maybe_unused) +static inline int off_cpu_prepare(struct evlist *evlist __maybe_unused, + struct target *target __maybe_unused) { return -1; } From patchwork Fri May 6 20:16:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Namhyung Kim X-Patchwork-Id: 12841576 X-Patchwork-Delegate: bpf@iogearbox.net Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF01BC433FE for ; Fri, 6 May 2022 20:16:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1442329AbiEFUU1 (ORCPT ); Fri, 6 May 2022 16:20:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47584 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1442323AbiEFUUV (ORCPT ); Fri, 6 May 2022 16:20:21 -0400 Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2D5E66AF5; Fri, 6 May 2022 13:16:36 -0700 (PDT) Received: by mail-pl1-x630.google.com with SMTP id n18so8478067plg.5; Fri, 06 May 2022 13:16:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=UL1WLDXYE0brFxgOgvbV8Z/RLTmnEOAPi1HjOcqSL8o=; b=DLINA9OgGy48buUfPlG851ZNEGGIaRvTtNPRBHOQHqecOEzw0DgbIjTuMyL9E/wz+i EYfh20blipGa84p+7u3kJyXDoO6xEslREWWAfBEsMN2TEtQjHAsRag+Rc9IoNeHZlzCB Oo+U0eVg4c7vY3+ePu+yCh15nXwoFOwSd+UQCUajocTlnlbBFU4gV37oyi3Qd8jVkqFC 4avz7xI8XIjzg6gQnTJ7sMjCSOqT3/SGlLHjN9fhyVg/E5HOyFFk+gfSNiuZeq6U5S8f ATut0nwQlnCRsnUujSjSiiJU+ho7VTffR58HQDVZ2EswcxlSp7iRabIAEfUHza8HYq4Q KWgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=UL1WLDXYE0brFxgOgvbV8Z/RLTmnEOAPi1HjOcqSL8o=; b=qNTKO9oXIVkD7pRnckxZRmvPZ2wWGgtt9mRpRP/on1EWv+nsToXoz0J45SCvmfp94C T1yhfepvUjH7US36fs6wfitEiCSwa9gi9jLJI/vDiCYX7vbkPuvaohfLaKd3IgvQakzX yOyioXj2sJOdipUiscMD+Pi2vVAXhCKxSoadoimShmr99NYq0DShWaoLmVdvBtDs7GEr emjWx8o9QD39sIEKT5bLhex1zKBygrfLqTvdjwC/TWfvyfvMmc5dz/xeV19dIfsdYXD2 ufibo7q38Wm+40O+ZKA5JitXMU5NKhvm6w/BTmcKOlAeaERekEqDfa0QsDrMoOvTSjIs X3vw== X-Gm-Message-State: AOAM5323Q1OozdiepKtdeoGBmp/2rr+ZjyO36syA1UOsl7b8+VKw1O/1 ZhsmsN5Pu6BKSAxBvSQVf3E= X-Google-Smtp-Source: ABdhPJxuPyLsjn+yEyNjyCJYB1NCIlQuoR5+JdQ95exFGCAh2nBokrDM/3FGYnbnZ/k/6vcPgac2YA== X-Received: by 2002:a17:90b:1b01:b0:1dc:46ed:90de with SMTP id nu1-20020a17090b1b0100b001dc46ed90demr14414040pjb.107.1651868196233; Fri, 06 May 2022 13:16:36 -0700 (PDT) Received: from balhae.hsd1.ca.comcast.net ([2601:647:4f00:3590:a5d1:d7b7:dd61:c87b]) by smtp.gmail.com with ESMTPSA id m2-20020a170902db8200b0015e8d4eb268sm2160156pld.178.2022.05.06.13.16.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 May 2022 13:16:35 -0700 (PDT) Sender: Namhyung Kim From: Namhyung Kim To: Arnaldo Carvalho de Melo , Jiri Olsa Cc: Ingo Molnar , Peter Zijlstra , LKML , Andi Kleen , Ian Rogers , Song Liu , Hao Luo , Milian Wolff , bpf@vger.kernel.org, linux-perf-users@vger.kernel.org, Blake Jones Subject: [PATCH 4/4] perf record: Handle argument change in sched_switch Date: Fri, 6 May 2022 13:16:27 -0700 Message-Id: <20220506201627.85598-5-namhyung@kernel.org> X-Mailer: git-send-email 2.36.0.512.ge40c2bad7a-goog In-Reply-To: <20220506201627.85598-1-namhyung@kernel.org> References: <20220506201627.85598-1-namhyung@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Recently sched_switch tracepoint added a new argument for prev_state, but it's hard to handle the change in a BPF program. Instead, we can check the function prototype in BTF before loading the program. Thus I make two copies of the tracepoint handler and select one based on the BTF info. Signed-off-by: Namhyung Kim --- tools/perf/util/bpf_off_cpu.c | 28 +++++++++++++++ tools/perf/util/bpf_skel/off_cpu.bpf.c | 48 ++++++++++++++++++++------ 2 files changed, 65 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/bpf_off_cpu.c b/tools/perf/util/bpf_off_cpu.c index 89f36229041d..31343db68ed3 100644 --- a/tools/perf/util/bpf_off_cpu.c +++ b/tools/perf/util/bpf_off_cpu.c @@ -86,6 +86,33 @@ static void off_cpu_finish(void *arg __maybe_unused) off_cpu_bpf__destroy(skel); } +/* recent kernel added prev_state arg, so it needs to call the proper function */ +static void check_sched_switch_args(void) +{ + const struct btf *btf = bpf_object__btf(skel->obj); + const struct btf_type *t1, *t2, *t3; + u32 type_id; + + type_id = btf__find_by_name_kind(btf, "bpf_trace_sched_switch", + BTF_KIND_TYPEDEF); + if ((s32)type_id < 0) + return; + + t1 = btf__type_by_id(btf, type_id); + if (t1 == NULL) + return; + + t2 = btf__type_by_id(btf, t1->type); + if (t2 == NULL || !btf_is_ptr(t2)) + return; + + t3 = btf__type_by_id(btf, t2->type); + if (t3 && btf_is_func_proto(t3) && btf_vlen(t3) == 4) { + /* new format: pass prev_state as 2nd arg */ + skel->rodata->has_prev_state = true; + } +} + int off_cpu_prepare(struct evlist *evlist, struct target *target) { int err, fd, i; @@ -114,6 +141,7 @@ int off_cpu_prepare(struct evlist *evlist, struct target *target) } set_max_rlimit(); + check_sched_switch_args(); err = off_cpu_bpf__load(skel); if (err) { diff --git a/tools/perf/util/bpf_skel/off_cpu.bpf.c b/tools/perf/util/bpf_skel/off_cpu.bpf.c index c35106b9e20b..98eaba95924f 100644 --- a/tools/perf/util/bpf_skel/off_cpu.bpf.c +++ b/tools/perf/util/bpf_skel/off_cpu.bpf.c @@ -72,6 +72,8 @@ int enabled = 0; int has_cpu = 0; int has_task = 0; +const volatile bool has_prev_state = false; + /* * Old kernel used to call it task_struct->state and now it's '__state'. * Use BPF CO-RE "ignored suffix rule" to deal with it like below: @@ -121,22 +123,13 @@ static inline int can_record(struct task_struct *t, int state) return 1; } -SEC("tp_btf/sched_switch") -int on_switch(u64 *ctx) +static int off_cpu_stat(u64 *ctx, struct task_struct *prev, + struct task_struct *next, int state) { __u64 ts; - int state; __u32 stack_id; - struct task_struct *prev, *next; struct tstamp_data *pelem; - if (!enabled) - return 0; - - prev = (struct task_struct *)ctx[1]; - next = (struct task_struct *)ctx[2]; - state = get_task_state(prev); - ts = bpf_ktime_get_ns(); if (!can_record(prev, state)) @@ -180,4 +173,37 @@ int on_switch(u64 *ctx) return 0; } +SEC("tp_btf/sched_switch") +int on_switch(u64 *ctx) +{ + struct task_struct *prev, *next; + int prev_state; + + if (!enabled) + return 0; + + /* + * For v5.18+: + * TP_PROTO(bool preempt, int prev_state, + * struct task_struct *prev, + * struct task_struct *next) + * + * On older kernels: + * TP_PROTO(bool preempt, struct task_struct *prev, + * struct task_struct *next) + */ + if (has_prev_state) { + prev = (struct task_struct *)ctx[2]; + next = (struct task_struct *)ctx[3]; + prev_state = (int)ctx[1]; + } else { + prev = (struct task_struct *)ctx[1]; + next = (struct task_struct *)ctx[2]; + + prev_state = get_task_state(prev); + } + + return off_cpu_stat(ctx, prev, next, prev_state); +} + char LICENSE[] SEC("license") = "Dual BSD/GPL";