diff mbox series

[v9,1/2] perf, uncore: Adding documentation for ThunderX2 pmu uncore driver

Message ID 20181205105853.15154-2-ganapatrao.kulkarni@cavium.com (mailing list archive)
State New, archived
Headers show
Series Add ThunderX2 SoC Performance Monitoring Unit driver | expand

Commit Message

Kulkarni, Ganapatrao Dec. 5, 2018, 10:59 a.m. UTC
The SoC has PMU support in its L3 cache controller (L3C) and in the
DDR4 Memory Controller (DMC).

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
---
 Documentation/perf/thunderx2-pmu.txt | 93 ++++++++++++++++++++++++++++
 1 file changed, 93 insertions(+)
 create mode 100644 Documentation/perf/thunderx2-pmu.txt

Comments

Randy Dunlap Dec. 5, 2018, 7:44 p.m. UTC | #1
Hi,
I have some documentation edits for you to consider:


On 12/5/18 2:59 AM, Kulkarni, Ganapatrao wrote:
> The SoC has PMU support in its L3 cache controller (L3C) and in the
> DDR4 Memory Controller (DMC).
> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@cavium.com>
> ---
>  Documentation/perf/thunderx2-pmu.txt | 93 ++++++++++++++++++++++++++++
>  1 file changed, 93 insertions(+)
>  create mode 100644 Documentation/perf/thunderx2-pmu.txt
> 
> diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
> new file mode 100644
> index 000000000000..f8835bf1068c
> --- /dev/null
> +++ b/Documentation/perf/thunderx2-pmu.txt
> @@ -0,0 +1,93 @@
> +
> +Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
> +==========================================================================
> +
> +ThunderX2 SoC PMU consists of independent system wide per Socket PMUs, such
> +as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC).
> +
> +The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles. Events
> +are counted for default channel(i.e channel 0) and prorated to total number of

               for the default channel (i.e. channel 0) and

> +channels/tiles.
> +
> +DMC and L3C supports up to 4 counters. Counters are independently programmable

               support

> +and can be started and stopped individually. Each counter can be set to
> +different event. Counters are 32 bit and do not support overflow interrupt;

 a different event.

> +they are read every 2 seconds.
> +
> +PMU UNCORE (perf) driver:
> +
> +The thunderx2_pmu driver registers per socket perf PMUs for DMC and L3C devices.
> +Each PMU can be used to count up to 4 events simultaneously. PMUs provide
> +description of its available events and configuration options
> +in sysfs, see /sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
> +
> +The driver does not support sampling, therefore "perf record" will
> +not work. Per-task perf sessions are not supported.
> +
> +Examples:
> +
> +perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
> +
> +perf stat -a -e \
> +uncore_dmc_0/cnt_cycles/,\
> +uncore_dmc_0/data_transfers/,\
> +uncore_dmc_0/read_txns/,\
> +uncore_dmc_0/write_txns/ sleep 1
> +
> +perf stat -a -e \
> +uncore_l3c_0/read_request/,\
> +uncore_l3c_0/read_hit/,\
> +uncore_l3c_0/inv_request/,\
> +uncore_l3c_0/inv_hit/ sleep 1
> +
> +
> +L3C events:
> +============
> +
> +read_request:
> +	Number of Read requests received by the L3 Cache.
> +	This include Read as well as Read Exclusives.

	     includes

> +
> +read_hit:
> +	Number of Read requests received by the L3 cache that were hit
> +	in the L3 (Data provided form the L3)
> +
> +writeback_request:
> +	Number of Write Backs received by the L3 Cache. These are basically
> +	the L2 Evicts and writes from the PCIe Write Cache.
> +
> +inv_nwrite_request:
> +	This is the Number of Invalidate and Write received by the L3 Cache.

	Number of Invalidate and Write requests received by the L3 Cache.

> +	Also Writes from IO that did not go through the PCIe Write Cache.
> +
> +inv_nwrite_hit
> +	This is the Number of Invalidate and Write received by the L3 Cache

	Number of Invalidate and Write requests received by the L3 Cache

> +	That were a hit in the L3 Cache.

	that

> +
> +inv_request:
> +	Number of Invalidate request received by the L3 Cache.

	                     requests

> +
> +inv_hit:
> +	Number of Invalidate request received by the L3 Cache that were a

	                     requests

> +	hit in L3.
> +
> +evict_request:
> +	Number of Evicts that the L3 generated.
> +
> +NOTE:
> +1. Granularity of all these events counter value is cache line length(64 Bytes).

                               event counter values               length (64 bytes).

> +2. L3C cache Hit Ratio = (read_hit + inv_nwrite_hit + inv_hit) / (read_request + inv_nwrite_request + inv_request)
> +
> +DMC events:
> +============
> +cnt_cycles:
> +	Count cycles (Clocks at the DMC clock rate)
> +
> +write_txns:
> +	Number of 64 Bytes write transactions received by the DMC(s)
> +
> +read_txns:
> +	Number of 64 Bytes Read transactions received by the DMC(s)
> +
> +data_transfers:
> +	Number of 64 Bytes data transferred to or from DRAM.
>
diff mbox series

Patch

diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
new file mode 100644
index 000000000000..f8835bf1068c
--- /dev/null
+++ b/Documentation/perf/thunderx2-pmu.txt
@@ -0,0 +1,93 @@ 
+
+Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
+==========================================================================
+
+ThunderX2 SoC PMU consists of independent system wide per Socket PMUs, such
+as Level 3 Cache(L3C) and DDR4 Memory Controller(DMC).
+
+The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles. Events
+are counted for default channel(i.e channel 0) and prorated to total number of
+channels/tiles.
+
+DMC and L3C supports up to 4 counters. Counters are independently programmable
+and can be started and stopped individually. Each counter can be set to
+different event. Counters are 32 bit and do not support overflow interrupt;
+they are read every 2 seconds.
+
+PMU UNCORE (perf) driver:
+
+The thunderx2_pmu driver registers per socket perf PMUs for DMC and L3C devices.
+Each PMU can be used to count up to 4 events simultaneously. PMUs provide
+description of its available events and configuration options
+in sysfs, see /sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
+
+The driver does not support sampling, therefore "perf record" will
+not work. Per-task perf sessions are not supported.
+
+Examples:
+
+perf stat -a -e uncore_dmc_0/cnt_cycles/ sleep 1
+
+perf stat -a -e \
+uncore_dmc_0/cnt_cycles/,\
+uncore_dmc_0/data_transfers/,\
+uncore_dmc_0/read_txns/,\
+uncore_dmc_0/write_txns/ sleep 1
+
+perf stat -a -e \
+uncore_l3c_0/read_request/,\
+uncore_l3c_0/read_hit/,\
+uncore_l3c_0/inv_request/,\
+uncore_l3c_0/inv_hit/ sleep 1
+
+
+L3C events:
+============
+
+read_request:
+	Number of Read requests received by the L3 Cache.
+	This include Read as well as Read Exclusives.
+
+read_hit:
+	Number of Read requests received by the L3 cache that were hit
+	in the L3 (Data provided form the L3)
+
+writeback_request:
+	Number of Write Backs received by the L3 Cache. These are basically
+	the L2 Evicts and writes from the PCIe Write Cache.
+
+inv_nwrite_request:
+	This is the Number of Invalidate and Write received by the L3 Cache.
+	Also Writes from IO that did not go through the PCIe Write Cache.
+
+inv_nwrite_hit
+	This is the Number of Invalidate and Write received by the L3 Cache
+	That were a hit in the L3 Cache.
+
+inv_request:
+	Number of Invalidate request received by the L3 Cache.
+
+inv_hit:
+	Number of Invalidate request received by the L3 Cache that were a
+	hit in L3.
+
+evict_request:
+	Number of Evicts that the L3 generated.
+
+NOTE:
+1. Granularity of all these events counter value is cache line length(64 Bytes).
+2. L3C cache Hit Ratio = (read_hit + inv_nwrite_hit + inv_hit) / (read_request + inv_nwrite_request + inv_request)
+
+DMC events:
+============
+cnt_cycles:
+	Count cycles (Clocks at the DMC clock rate)
+
+write_txns:
+	Number of 64 Bytes write transactions received by the DMC(s)
+
+read_txns:
+	Number of 64 Bytes Read transactions received by the DMC(s)
+
+data_transfers:
+	Number of 64 Bytes data transferred to or from DRAM.