From patchwork Mon Dec 19 16:12:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Clark X-Patchwork-Id: 13076979 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 67CD3C4332F for ; Mon, 19 Dec 2022 18:17:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-Id:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=MCrSS8uGe6rdwLQVKuBWXVzu4ovRnCMkPl4+4Eqj+ZM=; b=sFGxV0weCrzJ4x v67mzBZa4XYvCIqbbg+ofwpl1FmO3+jNfJvhhVmINXXmparPWh4wfHe8EAfIc0Oz9d39LLnCMkEbw 78lyJ9DJNJfhLlRhy9/ANRotx+ZvxfD6Nojb7eWz1gEpoWNJwx3dAtriQL6B9k3L4WZLTlxp1+YRS FnpBC+LAx0D4+5Rbvj/o48c7JKgeCjq7ct4lhkzVVE7547MqLBbgi571x0k0pzWWNotuzGzyGIhPR kvljojHrB5UYns4/vjehB8aV0LVUmf19FI8GISd/nnPiM30TJDTzyPIWb0TFQXy9kOJbIyfv9fOdr UaNDgjihqO27mPHxY0tw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p7KgR-00HIMc-KH; Mon, 19 Dec 2022 18:16:15 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1p7ImD-00Fs50-D3 for linux-arm-kernel@lists.infradead.org; Mon, 19 Dec 2022 16:14:08 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 85D70FEC; Mon, 19 Dec 2022 08:14:44 -0800 (PST) Received: from e126815.warwick.arm.com (e126815.arm.com [10.32.32.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5C9433F703; Mon, 19 Dec 2022 08:14:01 -0800 (PST) From: James Clark To: linux-perf-users@vger.kernel.org Cc: robh@kernel.org, German Gomez , James Clark , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Alexander Shishkin , Jiri Olsa , Namhyung Kim , John Garry , Will Deacon , Mike Leach , Leo Yan , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH 4/4] perf report: Add 'simd' sort field Date: Mon, 19 Dec 2022 16:12:58 +0000 Message-Id: <20221219161259.3097213-5-james.clark@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20221219161259.3097213-1-james.clark@arm.com> References: <20221219161259.3097213-1-james.clark@arm.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20221219_081406_306288_DA6C3B48 X-CRM114-Status: GOOD ( 15.40 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org From: German Gomez Add 'simd' sort field to visualize SIMD ops in perf-report. Rows are labeled with the SIMD isa, and the type of predicate (if any): - [p] partial predicate - [e] empty predicate (no elements in the vector being used) Example with Arm SPE and SVE (Scalable Vector Extension): #include double src[1025], dst[1025]; int main(void) { svfloat64_t vc = svdup_f64(1); for(;;) for(int i = 0; i < 1025; i += svcntd()) { svbool_t pg = svwhilelt_b64(i, 1025); svfloat64_t vsrc = svld1(pg, &src[i]); svfloat64_t vdst = svadd_x(pg, vsrc, vc); svst1(pg, &dst[i], vdst); } return 0; } ... compiled using "gcc-11 -march=armv8-a+sve -O3" Profiling on a platform that implements FEAT_SVE and FEAT_SPEv1p1: $ perf record -e arm_spe_0// -- ./a.out $ perf report --itrace=i1i -s overhead,pid,simd,sym Overhead Pid:Command Simd Symbol ........ ................ ....... ...................... 53.76% 10758:program [.] main 46.14% 10758:program [.] SVE [.] main 0.09% 10758:program [p] SVE [.] main The report shows 0.09% of the sampled SVE operations use partial predicates due to src and dst arrays not being multiples of the vector register lengths. Signed-off-by: German Gomez Signed-off-by: James Clark --- tools/perf/Documentation/perf-report.txt | 1 + tools/perf/util/hist.c | 1 + tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 47 ++++++++++++++++++++++++ tools/perf/util/sort.h | 2 + 5 files changed, 52 insertions(+) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 4fa509b15948..ff524d83a4a7 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -115,6 +115,7 @@ OPTIONS - p_stage_cyc: On powerpc, this presents the number of cycles spent in a pipeline stage. And currently supported only on powerpc. - addr: (Full) virtual address of the sampled instruction + - simd: Flags describing a SIMD operation. "e" for empty Arm SVE predicate. "p" for partial Arm SVE predicate By default, comm, dso and symbol keys are used. (i.e. --sort comm,dso,symbol) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index 17a05e943b44..e2390114c495 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -742,6 +742,7 @@ __hists__add_entry(struct hists *hists, .weight = sample->weight, .ins_lat = sample->ins_lat, .p_stage_cyc = sample->p_stage_cyc, + .simd_flags = sample->simd_flags, }, *he = hists__findnew_entry(hists, &entry, al, sample_self); if (!hists->has_callchains && he && he->callchain_size != 0) diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index ebd8a8f783ee..e6ecb4453053 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -80,6 +80,7 @@ enum hist_column { HISTC_ADDR_FROM, HISTC_ADDR_TO, HISTC_ADDR, + HISTC_SIMD, HISTC_NR_COLS, /* Last entry */ }; diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index 0ecc2cb13792..5c8bfea2ce34 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -131,6 +131,52 @@ struct sort_entry sort_thread = { .se_width_idx = HISTC_THREAD, }; +/* --sort simd */ + +static int64_t +sort__simd_cmp(struct hist_entry *left, struct hist_entry *right) +{ + if (left->simd_flags.arch != right->simd_flags.arch) + return (int64_t) left->simd_flags.arch - right->simd_flags.arch; + + return (int64_t) left->simd_flags.pred - right->simd_flags.pred; +} + +static const char *hist_entry__get_simd_name(struct simd_flags *simd_flags) +{ + u64 arch = simd_flags->arch; + + if (arch & SIMD_OP_FLAGS_ARCH_SVE) + return "SVE"; + else + return "n/a"; +} + +static int hist_entry__simd_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width __maybe_unused) +{ + const char *name; + + if (!he->simd_flags.arch) + return repsep_snprintf(bf, size, ""); + + name = hist_entry__get_simd_name(&he->simd_flags); + + if (he->simd_flags.pred & SIMD_OP_FLAGS_PRED_EMPTY) + return repsep_snprintf(bf, size, "[e] %s", name); + else if (he->simd_flags.pred & SIMD_OP_FLAGS_PRED_PARTIAL) + return repsep_snprintf(bf, size, "[p] %s", name); + + return repsep_snprintf(bf, size, "[.] %s", name); +} + +struct sort_entry sort_simd = { + .se_header = "Simd ", + .se_cmp = sort__simd_cmp, + .se_snprintf = hist_entry__simd_snprintf, + .se_width_idx = HISTC_SIMD, +}; + /* --sort comm */ /* @@ -2042,6 +2088,7 @@ static struct sort_dimension common_sort_dimensions[] = { DIM(SORT_LOCAL_PIPELINE_STAGE_CYC, "local_p_stage_cyc", sort_local_p_stage_cyc), DIM(SORT_GLOBAL_PIPELINE_STAGE_CYC, "p_stage_cyc", sort_global_p_stage_cyc), DIM(SORT_ADDR, "addr", sort_addr), + DIM(SORT_SIMD, "simd", sort_simd) }; #undef DIM diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h index 04ff8b61a2a7..8e69a2a53dc1 100644 --- a/tools/perf/util/sort.h +++ b/tools/perf/util/sort.h @@ -110,6 +110,7 @@ struct hist_entry { u64 p_stage_cyc; u8 cpumode; u8 depth; + struct simd_flags simd_flags; /* We are added by hists__add_dummy_entry. */ bool dummy; @@ -237,6 +238,7 @@ enum sort_type { SORT_LOCAL_PIPELINE_STAGE_CYC, SORT_GLOBAL_PIPELINE_STAGE_CYC, SORT_ADDR, + SORT_SIMD, /* branch stack specific sort keys */ __SORT_BRANCH_STACK,