diff mbox series

[1/4] tools/perf/arch/riscv: record registers needed for --call-graph=dwarf

Message ID 20210705232524.4024832-2-edwin@etorok.net (mailing list archive)
State New, archived
Headers show
Series RISC-V: make perf record --call-graph=dwarf work | expand

Commit Message

Edwin Török July 5, 2021, 11:25 p.m. UTC
For libdw-based callgraph sampling to work we need to sample registers.

Tested on HiFive Unmatched with a trunk version of OCaml:
```
 # perf record --user-regs=?
available registers: pc ra sp gp tp t0 t1 t2 s0 s1 a0 a1 a2 a3 a4 a5 a6 a7
 s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 t3 t4 t5 t6

 # sed -e 's/(\*\*//' -e 's/\*\*)//' \
testsuite/tests/misc-unsafe/almabench.ml >almabench_benchmark.ml
 # ocamlopt almabench_benchmark.ml -unsafe -o almabench_benchmark
 # perf record -e cpu-clock -v --call-graph=dwarf -F 99 \
./almabench_benchmark

callchain: type DWARF
callchain: stack dump size 8192
nr_cblocks: 0
affinity: SYS
mmap flush: 1
comp level: 0
mmap size 528384B
mmap size 528384B
Control descriptor is not initialized
0 17.00 -26.06
1 12.34 1.29
2 6.83 22.95
3 0.04 -1.26
4 2.30 12.54
5 2.93 14.35
6 21.27 -16.57
7 20.41 -19.04
[ perf record: Woken up 65 times to write data ]
Looking at the vmlinux_path (8 entries long)
Failed to open /proc/kcore.
Note /proc/kcore requires CAP_SYS_RAWIO capability to access.
Using /proc/kallsyms for symbols
failed to write feature CPUDESC
failed to write feature CPUID
failed to write feature NUMA_TOPOLOGY
failed to write feature MEM_TOPOLOGY
[ perf record: Captured and wrote 16.265 MB perf.data (1997 samples) ]

 # perf script
...
almabench_bench  1193 14816.399235:   10101010 cpu-clock:
 3fbbd63d54 reduce_sincos+0x268 (inlined)
 3fbbd63d54 __cos+0x268 (/lib/libm-2.33.so)
 2aac4b261b Almabench_benchmark.planetpv+0x25f
    (/home/root/ocaml/almabench_benchmark)
 2aac4b3e03 Almabench_benchmark.entry+0x12bb
    (/home/root/ocaml/almabench_benchmark)
 2aac4b0097 caml_program+0x143
    (/home/root/ocaml/almabench_benchmark)
 2aac4e4e3d caml_start_program+0x71
    (/home/root/ocaml/almabench_benchmark)
 2aac4e5409 caml_startup_common+0x18f
    (/home/root/ocaml/almabench_benchmark)
 2aac4e5465 caml_startup_exn+0x9 (inlined)
 2aac4e5465 caml_startup+0x9 (inlined)
 2aac4e5465 caml_main+0x9
    (/home/root/ocaml/almabench_benchmark)
 2aac4afe89 main+0x9
    (/home/root/ocaml/almabench_benchmark)
 3fbbc47b0b __libc_start_main+0x85 (/lib/libc-2.33.so)
 2aac4afebb _start+0x2b (/home/root/ocaml/almabench_benchmark)
```

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-perf-users@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Signed-off-by: Edwin Török <edwin@etorok.net>
---
 tools/perf/arch/riscv/util/perf_regs.c | 32 ++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Palmer Dabbelt Aug. 4, 2021, 4:22 a.m. UTC | #1
On Mon, 05 Jul 2021 16:25:21 PDT (-0700), edwin@etorok.net wrote:
> For libdw-based callgraph sampling to work we need to sample registers.
>
> Tested on HiFive Unmatched with a trunk version of OCaml:

These generally LGTM, but I don't have patch #2.  Sometimes that means 
they're just not targeted at the RISC-V tree, which is fine with me but 
I'm happy to take some time to look closer and take them if that's what 
you're looking for.

A cover letter can be a good bet, to describe this sort of stuff.

> ```
>  # perf record --user-regs=?
> available registers: pc ra sp gp tp t0 t1 t2 s0 s1 a0 a1 a2 a3 a4 a5 a6 a7
>  s2 s3 s4 s5 s6 s7 s8 s9 s10 s11 t3 t4 t5 t6
>
>  # sed -e 's/(\*\*//' -e 's/\*\*)//' \
> testsuite/tests/misc-unsafe/almabench.ml >almabench_benchmark.ml
>  # ocamlopt almabench_benchmark.ml -unsafe -o almabench_benchmark
>  # perf record -e cpu-clock -v --call-graph=dwarf -F 99 \
> ./almabench_benchmark
>
> callchain: type DWARF
> callchain: stack dump size 8192
> nr_cblocks: 0
> affinity: SYS
> mmap flush: 1
> comp level: 0
> mmap size 528384B
> mmap size 528384B
> Control descriptor is not initialized
> 0 17.00 -26.06
> 1 12.34 1.29
> 2 6.83 22.95
> 3 0.04 -1.26
> 4 2.30 12.54
> 5 2.93 14.35
> 6 21.27 -16.57
> 7 20.41 -19.04
> [ perf record: Woken up 65 times to write data ]
> Looking at the vmlinux_path (8 entries long)
> Failed to open /proc/kcore.
> Note /proc/kcore requires CAP_SYS_RAWIO capability to access.
> Using /proc/kallsyms for symbols
> failed to write feature CPUDESC
> failed to write feature CPUID
> failed to write feature NUMA_TOPOLOGY
> failed to write feature MEM_TOPOLOGY
> [ perf record: Captured and wrote 16.265 MB perf.data (1997 samples) ]
>
>  # perf script
> ...
> almabench_bench  1193 14816.399235:   10101010 cpu-clock:
>  3fbbd63d54 reduce_sincos+0x268 (inlined)
>  3fbbd63d54 __cos+0x268 (/lib/libm-2.33.so)
>  2aac4b261b Almabench_benchmark.planetpv+0x25f
>     (/home/root/ocaml/almabench_benchmark)
>  2aac4b3e03 Almabench_benchmark.entry+0x12bb
>     (/home/root/ocaml/almabench_benchmark)
>  2aac4b0097 caml_program+0x143
>     (/home/root/ocaml/almabench_benchmark)
>  2aac4e4e3d caml_start_program+0x71
>     (/home/root/ocaml/almabench_benchmark)
>  2aac4e5409 caml_startup_common+0x18f
>     (/home/root/ocaml/almabench_benchmark)
>  2aac4e5465 caml_startup_exn+0x9 (inlined)
>  2aac4e5465 caml_startup+0x9 (inlined)
>  2aac4e5465 caml_main+0x9
>     (/home/root/ocaml/almabench_benchmark)
>  2aac4afe89 main+0x9
>     (/home/root/ocaml/almabench_benchmark)
>  3fbbc47b0b __libc_start_main+0x85 (/lib/libc-2.33.so)
>  2aac4afebb _start+0x2b (/home/root/ocaml/almabench_benchmark)
> ```
>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
> Cc: Jiri Olsa <jolsa@redhat.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: linux-perf-users@vger.kernel.org
> Cc: linux-riscv@lists.infradead.org
> Signed-off-by: Edwin Török <edwin@etorok.net>
> ---
>  tools/perf/arch/riscv/util/perf_regs.c | 32 ++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/tools/perf/arch/riscv/util/perf_regs.c b/tools/perf/arch/riscv/util/perf_regs.c
> index 2864e2e3776d..8c9f511e8322 100644
> --- a/tools/perf/arch/riscv/util/perf_regs.c
> +++ b/tools/perf/arch/riscv/util/perf_regs.c
> @@ -2,5 +2,37 @@
>  #include "../../util/perf_regs.h"
>
>  const struct sample_reg sample_reg_masks[] = {
> +	SMPL_REG(pc, PERF_REG_RISCV_PC),
> +	SMPL_REG(ra, PERF_REG_RISCV_RA),
> +	SMPL_REG(sp, PERF_REG_RISCV_SP),
> +	SMPL_REG(gp, PERF_REG_RISCV_GP),
> +	SMPL_REG(tp, PERF_REG_RISCV_TP),
> +	SMPL_REG(t0, PERF_REG_RISCV_T0),
> +	SMPL_REG(t1, PERF_REG_RISCV_T1),
> +	SMPL_REG(t2, PERF_REG_RISCV_T2),
> +	SMPL_REG(s0, PERF_REG_RISCV_S0),
> +	SMPL_REG(s1, PERF_REG_RISCV_S1),
> +	SMPL_REG(a0, PERF_REG_RISCV_A0),
> +	SMPL_REG(a1, PERF_REG_RISCV_A1),
> +	SMPL_REG(a2, PERF_REG_RISCV_A2),
> +	SMPL_REG(a3, PERF_REG_RISCV_A3),
> +	SMPL_REG(a4, PERF_REG_RISCV_A4),
> +	SMPL_REG(a5, PERF_REG_RISCV_A5),
> +	SMPL_REG(a6, PERF_REG_RISCV_A6),
> +	SMPL_REG(a7, PERF_REG_RISCV_A7),
> +	SMPL_REG(s2, PERF_REG_RISCV_S2),
> +	SMPL_REG(s3, PERF_REG_RISCV_S3),
> +	SMPL_REG(s4, PERF_REG_RISCV_S4),
> +	SMPL_REG(s5, PERF_REG_RISCV_S5),
> +	SMPL_REG(s6, PERF_REG_RISCV_S6),
> +	SMPL_REG(s7, PERF_REG_RISCV_S7),
> +	SMPL_REG(s8, PERF_REG_RISCV_S8),
> +	SMPL_REG(s9, PERF_REG_RISCV_S9),
> +	SMPL_REG(s10, PERF_REG_RISCV_S10),
> +	SMPL_REG(s11, PERF_REG_RISCV_S11),
> +	SMPL_REG(t3, PERF_REG_RISCV_T3),
> +	SMPL_REG(t4, PERF_REG_RISCV_T4),
> +	SMPL_REG(t5, PERF_REG_RISCV_T5),
> +	SMPL_REG(t6, PERF_REG_RISCV_T6),
>  	SMPL_REG_END
>  };
Edwin Török Aug. 4, 2021, 6:39 p.m. UTC | #2
On Tue, 2021-08-03 at 21:22 -0700, Palmer Dabbelt wrote:
> On Mon, 05 Jul 2021 16:25:21 PDT (-0700), edwin@etorok.net wrote:
> > For libdw-based callgraph sampling to work we need to sample
> > registers.
> > 
> > Tested on HiFive Unmatched with a trunk version of OCaml:
> 
> These generally LGTM, but I don't have patch #2.

PATCH 2/4 is here:
https://lore.kernel.org/linux-riscv/20210705232524.4024832-3-edwin@etorok.net/
https://patchwork.kernel.org/project/linux-riscv/patch/20210705232524.4024832-3-edwin@etorok.net/

>   Sometimes that means 
> they're just not targeted at the RISC-V tree, which is fine with me but
> I'm happy to take some time to look closer and take them if that's what
> you're looking for.
> A cover letter can be a good bet, to describe this sort of stuff.

Thanks for taking a look, I'll try to ensure that RISC-V maintainers
are CC-ed on the entire series in the future (I used
scripts/get_maintainer.pl which might've been too selective).

All patches should be on the RISC-V mailing list though, cover letter
included:
https://lore.kernel.org/linux-riscv/20210705232524.4024832-1-edwin@etorok.net/T/#t
https://patchwork.kernel.org/project/linux-riscv/list/?series=511107

PATCH 2/4 is patching tools/perf/check-headers.sh, so although
technically outside of the RISC-V specific tree it actually adds one
line to include the riscv headers, so if you could take all 4 patches
into your tree that'd be great.

Thanks,
--Edwin
diff mbox series

Patch

diff --git a/tools/perf/arch/riscv/util/perf_regs.c b/tools/perf/arch/riscv/util/perf_regs.c
index 2864e2e3776d..8c9f511e8322 100644
--- a/tools/perf/arch/riscv/util/perf_regs.c
+++ b/tools/perf/arch/riscv/util/perf_regs.c
@@ -2,5 +2,37 @@ 
 #include "../../util/perf_regs.h"
 
 const struct sample_reg sample_reg_masks[] = {
+	SMPL_REG(pc, PERF_REG_RISCV_PC),
+	SMPL_REG(ra, PERF_REG_RISCV_RA),
+	SMPL_REG(sp, PERF_REG_RISCV_SP),
+	SMPL_REG(gp, PERF_REG_RISCV_GP),
+	SMPL_REG(tp, PERF_REG_RISCV_TP),
+	SMPL_REG(t0, PERF_REG_RISCV_T0),
+	SMPL_REG(t1, PERF_REG_RISCV_T1),
+	SMPL_REG(t2, PERF_REG_RISCV_T2),
+	SMPL_REG(s0, PERF_REG_RISCV_S0),
+	SMPL_REG(s1, PERF_REG_RISCV_S1),
+	SMPL_REG(a0, PERF_REG_RISCV_A0),
+	SMPL_REG(a1, PERF_REG_RISCV_A1),
+	SMPL_REG(a2, PERF_REG_RISCV_A2),
+	SMPL_REG(a3, PERF_REG_RISCV_A3),
+	SMPL_REG(a4, PERF_REG_RISCV_A4),
+	SMPL_REG(a5, PERF_REG_RISCV_A5),
+	SMPL_REG(a6, PERF_REG_RISCV_A6),
+	SMPL_REG(a7, PERF_REG_RISCV_A7),
+	SMPL_REG(s2, PERF_REG_RISCV_S2),
+	SMPL_REG(s3, PERF_REG_RISCV_S3),
+	SMPL_REG(s4, PERF_REG_RISCV_S4),
+	SMPL_REG(s5, PERF_REG_RISCV_S5),
+	SMPL_REG(s6, PERF_REG_RISCV_S6),
+	SMPL_REG(s7, PERF_REG_RISCV_S7),
+	SMPL_REG(s8, PERF_REG_RISCV_S8),
+	SMPL_REG(s9, PERF_REG_RISCV_S9),
+	SMPL_REG(s10, PERF_REG_RISCV_S10),
+	SMPL_REG(s11, PERF_REG_RISCV_S11),
+	SMPL_REG(t3, PERF_REG_RISCV_T3),
+	SMPL_REG(t4, PERF_REG_RISCV_T4),
+	SMPL_REG(t5, PERF_REG_RISCV_T5),
+	SMPL_REG(t6, PERF_REG_RISCV_T6),
 	SMPL_REG_END
 };