Message ID | 20220824221701.41932-1-atishp@rivosinc.com (mailing list archive) |
---|---|
Headers | show |
Series | Improve PMU support | expand |
On Thu, Aug 25, 2022 at 8:22 AM Atish Patra <atishp@rivosinc.com> wrote: > > The latest version of the SBI specification includes a Performance Monitoring > Unit(PMU) extension[1] which allows the supervisor to start/stop/configure > various PMU events. The Sscofpmf ('Ss' for Privileged arch and Supervisor-level > extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering) > extension[2] allows the perf like tool to handle overflow interrupts and > filtering support. > > This series implements remaining PMU infrastructure to support > PMU in virt machine. The first seven patches from the original series > have been merged already. > > This will allow us to add any PMU events in future. > Currently, this series enables the following omu events. > 1. cycle count > 2. instruction count > 3. DTLB load/store miss > 4. ITLB prefetch miss > > The first two are computed using host ticks while last three are counted during > cpu_tlb_fill. We can do both sampling and count from guest userspace. > This series has been tested on both RV64 and RV32. Both Linux[3] and Opensbi[4] > patches are required to get the perf working. > > Here is an output of perf stat/report while running hackbench with latest > OpenSBI & Linux kernel. > > Perf stat: > ========== > [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \ > > perf bench sched messaging -g 1 -l 10 > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 1 groups == 40 processes run > > Total time: 0.265 [sec] > > Performance counter stats for 'perf bench sched messaging -g 1 -l 10': > > 4,167,825,362 cycles > 4,166,609,256 instructions # 1.00 insn per cycle > 3,092,026 dTLB-load-misses > 258,280 dTLB-store-misses > 2,068,966 iTLB-load-misses > > 0.585791767 seconds time elapsed > > 0.373802000 seconds user > 1.042359000 seconds sys > > Perf record: > ============ > [root@fedora-riscv ~]# perf record -e cycles -e instructions \ > > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \ > > perf bench sched messaging -g 1 -l 10 > # Running 'sched/messaging' benchmark: > # 20 sender and receiver processes per group > # 1 groups == 40 processes run > > Total time: 1.397 [sec] > [ perf record: Woken up 10 times to write data ] > Check IO/CPU overload! > [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ] > > [root@fedora-riscv riscv]# perf report > Available samples > 107K cycles ◆ > 107K instructions ▒ > 250 dTLB-load-misses ▒ > 13 dTLB-store-misses ▒ > 172 iTLB-load-misses > .. > > Changes from v13->v14: > 1. Added sanity check for the hashtable in pmu.c > > Changes from v12->v13: > 1. Rebased on top of the apply-next. > 2. Addressed comments about space & comment block. > > Changes from v11->v12: > 1. Rebased on top of the apply-next. > 2. Aligned the write function & .min_priv to the previous line. > 3. Fixed the FDT generations for multi-socket scenario. > 4. Dropped interrupt property from the DT. > 5. Generate illegal instruction fault instead of virtual instruction fault > for VS/VU access while mcounteren is not set. > > Changes from v10->v11: > 1. Rebased on top of the master where first 7 patches were already merged. > 2. Removed unnecessary additional check in ctr predicate function. > 3. Removed unnecessary priv version checks in mcountinhibit read/write. > 4. Added Heiko's reviewed-by/tested-by tags. > > Changes from v8->v9: > 1. Added the write_done flags to the vmstate. > 2. Fixed the hpmcounter read access from M-mode. > > Changes from v7->v8: > 1. Removeding ordering constraints for mhpmcounter & mhpmevent. > > Changes from v6->v7: > 1. Fixed all the compilation errors for the usermode. > > Changes from v5->v6: > 1. Fixed compilation issue with PATCH 1. > 2. Addressed other comments. > > Changes from v4->v5: > 1. Rebased on top of the -next with following patches. > - isa extension > - priv 1.12 spec > 2. Addressed all the comments on v4 > 3. Removed additional isa-ext DT node in favor of riscv,isa string update > > Changes from v3->v4: > 1. Removed the dummy events from pmu DT node. > 2. Fixed pmu_avail_counters mask generation. > 3. Added a patch to simplify the predicate function for counters. > > Changes from v2->v3: > 1. Addressed all the comments on PATCH1-4. > 2. Split patch1 into two separate patches. > 3. Added explicit comments to explain the event types in DT node. > 4. Rebased on latest Qemu. > > Changes from v1->v2: > 1. Dropped the ACks from v1 as signficant changes happened after v1. > 2. sscofpmf support. > 3. A generic counter management framework. > > [1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc > [2] https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit > [3] https://github.com/atishp04/qemu/tree/riscv_pmu_v14 > > Atish Patra (5): > target/riscv: Add sscofpmf extension support > target/riscv: Simplify counter predicate function > target/riscv: Add few cache related PMU events > hw/riscv: virt: Add PMU DT node to the device tree > target/riscv: Update the privilege field for sscofpmf CSRs Sorry, but this doesn't apply. Are you able to rebase it? Alistair > > hw/riscv/virt.c | 16 ++ > target/riscv/cpu.c | 12 ++ > target/riscv/cpu.h | 25 +++ > target/riscv/cpu_bits.h | 55 +++++ > target/riscv/cpu_helper.c | 25 +++ > target/riscv/csr.c | 304 +++++++++++++++++---------- > target/riscv/machine.c | 1 + > target/riscv/pmu.c | 425 +++++++++++++++++++++++++++++++++++++- > target/riscv/pmu.h | 8 + > 9 files changed, 760 insertions(+), 111 deletions(-) > > -- > 2.25.1 > >
On Mon, Sep 19, 2022 at 3:08 PM Alistair Francis <alistair23@gmail.com> wrote: > On Thu, Aug 25, 2022 at 8:22 AM Atish Patra <atishp@rivosinc.com> wrote: > > > > The latest version of the SBI specification includes a Performance > Monitoring > > Unit(PMU) extension[1] which allows the supervisor to > start/stop/configure > > various PMU events. The Sscofpmf ('Ss' for Privileged arch and > Supervisor-level > > extensions, and 'cofpmf' for Count OverFlow and Privilege Mode Filtering) > > extension[2] allows the perf like tool to handle overflow interrupts and > > filtering support. > > > > This series implements remaining PMU infrastructure to support > > PMU in virt machine. The first seven patches from the original series > > have been merged already. > > > > This will allow us to add any PMU events in future. > > Currently, this series enables the following omu events. > > 1. cycle count > > 2. instruction count > > 3. DTLB load/store miss > > 4. ITLB prefetch miss > > > > The first two are computed using host ticks while last three are counted > during > > cpu_tlb_fill. We can do both sampling and count from guest userspace. > > This series has been tested on both RV64 and RV32. Both Linux[3] and > Opensbi[4] > > patches are required to get the perf working. > > > > Here is an output of perf stat/report while running hackbench with latest > > OpenSBI & Linux kernel. > > > > Perf stat: > > ========== > > [root@fedora-riscv ~]# perf stat -e cycles -e instructions -e > dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses \ > > > perf bench sched messaging -g 1 -l 10 > > # Running 'sched/messaging' benchmark: > > # 20 sender and receiver processes per group > > # 1 groups == 40 processes run > > > > Total time: 0.265 [sec] > > > > Performance counter stats for 'perf bench sched messaging -g 1 -l 10': > > > > 4,167,825,362 cycles > > 4,166,609,256 instructions # 1.00 insn per > cycle > > 3,092,026 dTLB-load-misses > > 258,280 dTLB-store-misses > > 2,068,966 iTLB-load-misses > > > > 0.585791767 seconds time elapsed > > > > 0.373802000 seconds user > > 1.042359000 seconds sys > > > > Perf record: > > ============ > > [root@fedora-riscv ~]# perf record -e cycles -e instructions \ > > > -e dTLB-load-misses -e dTLB-store-misses -e iTLB-load-misses -c 10000 \ > > > perf bench sched messaging -g 1 -l 10 > > # Running 'sched/messaging' benchmark: > > # 20 sender and receiver processes per group > > # 1 groups == 40 processes run > > > > Total time: 1.397 [sec] > > [ perf record: Woken up 10 times to write data ] > > Check IO/CPU overload! > > [ perf record: Captured and wrote 8.211 MB perf.data (214486 samples) ] > > > > [root@fedora-riscv riscv]# perf report > > Available samples > > 107K cycles > ◆ > > 107K instructions > ▒ > > 250 dTLB-load-misses > ▒ > > 13 dTLB-store-misses > ▒ > > 172 iTLB-load-misses > > .. > > > > Changes from v13->v14: > > 1. Added sanity check for the hashtable in pmu.c > > > > Changes from v12->v13: > > 1. Rebased on top of the apply-next. > > 2. Addressed comments about space & comment block. > > > > Changes from v11->v12: > > 1. Rebased on top of the apply-next. > > 2. Aligned the write function & .min_priv to the previous line. > > 3. Fixed the FDT generations for multi-socket scenario. > > 4. Dropped interrupt property from the DT. > > 5. Generate illegal instruction fault instead of virtual instruction > fault > > for VS/VU access while mcounteren is not set. > > > > Changes from v10->v11: > > 1. Rebased on top of the master where first 7 patches were already > merged. > > 2. Removed unnecessary additional check in ctr predicate function. > > 3. Removed unnecessary priv version checks in mcountinhibit read/write. > > 4. Added Heiko's reviewed-by/tested-by tags. > > > > Changes from v8->v9: > > 1. Added the write_done flags to the vmstate. > > 2. Fixed the hpmcounter read access from M-mode. > > > > Changes from v7->v8: > > 1. Removeding ordering constraints for mhpmcounter & mhpmevent. > > > > Changes from v6->v7: > > 1. Fixed all the compilation errors for the usermode. > > > > Changes from v5->v6: > > 1. Fixed compilation issue with PATCH 1. > > 2. Addressed other comments. > > > > Changes from v4->v5: > > 1. Rebased on top of the -next with following patches. > > - isa extension > > - priv 1.12 spec > > 2. Addressed all the comments on v4 > > 3. Removed additional isa-ext DT node in favor of riscv,isa string update > > > > Changes from v3->v4: > > 1. Removed the dummy events from pmu DT node. > > 2. Fixed pmu_avail_counters mask generation. > > 3. Added a patch to simplify the predicate function for counters. > > > > Changes from v2->v3: > > 1. Addressed all the comments on PATCH1-4. > > 2. Split patch1 into two separate patches. > > 3. Added explicit comments to explain the event types in DT node. > > 4. Rebased on latest Qemu. > > > > Changes from v1->v2: > > 1. Dropped the ACks from v1 as signficant changes happened after v1. > > 2. sscofpmf support. > > 3. A generic counter management framework. > > > > [1] > https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/riscv-sbi.adoc > > [2] > https://drive.google.com/file/d/171j4jFjIkKdj5LWcExphq4xG_2sihbfd/edit > > [3] https://github.com/atishp04/qemu/tree/riscv_pmu_v14 > > > > Atish Patra (5): > > target/riscv: Add sscofpmf extension support > > target/riscv: Simplify counter predicate function > > target/riscv: Add few cache related PMU events > > hw/riscv: virt: Add PMU DT node to the device tree > > target/riscv: Update the privilege field for sscofpmf CSRs > > Sorry, but this doesn't apply. Are you able to rebase it? > > I am a bit confused. Your PULL request on Sep 7th already included this & sstc series. I can see the patches in upstream qemu as well. > Alistair > > > > > hw/riscv/virt.c | 16 ++ > > target/riscv/cpu.c | 12 ++ > > target/riscv/cpu.h | 25 +++ > > target/riscv/cpu_bits.h | 55 +++++ > > target/riscv/cpu_helper.c | 25 +++ > > target/riscv/csr.c | 304 +++++++++++++++++---------- > > target/riscv/machine.c | 1 + > > target/riscv/pmu.c | 425 +++++++++++++++++++++++++++++++++++++- > > target/riscv/pmu.h | 8 + > > 9 files changed, 760 insertions(+), 111 deletions(-) > > > > -- > > 2.25.1 > > > > >