mbox series

[v5,00/25] Add perf support to the rockchip-dfi driver

Message ID 20230524083153.2046084-1-s.hauer@pengutronix.de (mailing list archive)
Headers show
Series Add perf support to the rockchip-dfi driver | expand

Message

Sascha Hauer May 24, 2023, 8:31 a.m. UTC
This is v5 of the series adding perf support to the rockchip DFI driver.

A lot has changed in the perf driver since v4. First of all the review
feedback from Robin and Jonathan has been integrated. The perf driver
now not only supports monitoring the total DDR utilization, but also the
individual channels. I also reworked the way the raw 32bit counter
values are summed up to 64bit perf values, so hopefully the code is
easier to follow now.

lockdep found out that that locking in the perf driver was broken, so I
reworked that as well. None of the perf hooks allows locking with
mutexes or spinlocks, so in perf it's not possible to enable the DFI
controller when needed. Instead I now unconditionally enable the DFI
controller during probe when perf is enabled.

Furthermore the hrtimer I use for reading out the hardware counter
values before they overflow race with perf. Now a seqlock is used to
prevent that.

The RK3588 device tree changes for the DFI were not part of v4. As
Vincent Legoll showed interest in testing this series the necessary
device tree changes are now part of this series.

Changes since v4:
- Add device tree changes for RK3588
- Use seqlock to protect perf counter values from hrtimer
- Unconditionally enable DFI when perf is enabled
- Bring back changes to dts/binding patches that were lost in v4

Changes since v3:
- Add RK3588 support

Changes since v2:
- Fix broken reference to binding
- Add Reviewed-by from Rob

Changes since v1:
- Fix example to actually match the binding and fix the warnings resulted thereof
- Make addition of rockchip,rk3568-dfi an extra patch

Sascha Hauer (25):
  PM / devfreq: rockchip-dfi: Make pmu regmap mandatory
  PM / devfreq: rockchip-dfi: Embed desc into private data struct
  PM / devfreq: rockchip-dfi: use consistent name for private data
    struct
  PM / devfreq: rockchip-dfi: Add SoC specific init function
  PM / devfreq: rockchip-dfi: dfi store raw values in counter struct
  PM / devfreq: rockchip-dfi: Use free running counter
  PM / devfreq: rockchip-dfi: introduce channel mask
  PM / devfreq: rk3399_dmc,dfi: generalize DDRTYPE defines
  PM / devfreq: rockchip-dfi: Clean up DDR type register defines
  PM / devfreq: rockchip-dfi: Add RK3568 support
  PM / devfreq: rockchip-dfi: Handle LPDDR2 correctly
  PM / devfreq: rockchip-dfi: Handle LPDDR4X
  PM / devfreq: rockchip-dfi: Pass private data struct to internal
    functions
  PM / devfreq: rockchip-dfi: Prepare for multiple users
  PM / devfreq: rockchip-dfi: give variable a better name
  PM / devfreq: rockchip-dfi: Add perf support
  PM / devfreq: rockchip-dfi: make register stride SoC specific
  PM / devfreq: rockchip-dfi: account for multiple DDRMON_CTRL registers
  PM / devfreq: rockchip-dfi: add support for RK3588
  dt-bindings: devfreq: event: convert Rockchip DFI binding to yaml
  dt-bindings: devfreq: event: rockchip,dfi: Add rk3568 support
  dt-bindings: devfreq: event: rockchip,dfi: Add rk3588 support
  arm64: dts: rockchip: rk3399: Enable DFI
  arm64: dts: rockchip: rk356x: Add DFI
  arm64: dts: rockchip: rk3588s: Add DFI

 .../bindings/devfreq/event/rockchip,dfi.yaml  |  84 ++
 .../bindings/devfreq/event/rockchip-dfi.txt   |  18 -
 .../rockchip,rk3399-dmc.yaml                  |   2 +-
 arch/arm64/boot/dts/rockchip/rk3399.dtsi      |   1 -
 arch/arm64/boot/dts/rockchip/rk356x.dtsi      |   7 +
 arch/arm64/boot/dts/rockchip/rk3588s.dtsi     |  16 +
 drivers/devfreq/event/rockchip-dfi.c          | 796 +++++++++++++++---
 drivers/devfreq/rk3399_dmc.c                  |  10 +-
 include/soc/rockchip/rk3399_grf.h             |   9 +-
 include/soc/rockchip/rk3568_grf.h             |  13 +
 include/soc/rockchip/rk3588_grf.h             |  18 +
 include/soc/rockchip/rockchip_grf.h           |  18 +
 12 files changed, 854 insertions(+), 138 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/devfreq/event/rockchip,dfi.yaml
 delete mode 100644 Documentation/devicetree/bindings/devfreq/event/rockchip-dfi.txt
 create mode 100644 include/soc/rockchip/rk3568_grf.h
 create mode 100644 include/soc/rockchip/rk3588_grf.h
 create mode 100644 include/soc/rockchip/rockchip_grf.h

Comments

Sebastian Reichel June 14, 2023, 1:40 p.m. UTC | #1
Hi,

On Wed, May 24, 2023 at 10:31:28AM +0200, Sascha Hauer wrote:
> This is v5 of the series adding perf support to the rockchip DFI driver.
> 
> A lot has changed in the perf driver since v4. First of all the review
> feedback from Robin and Jonathan has been integrated. The perf driver
> now not only supports monitoring the total DDR utilization, but also the
> individual channels. I also reworked the way the raw 32bit counter
> values are summed up to 64bit perf values, so hopefully the code is
> easier to follow now.
> 
> lockdep found out that that locking in the perf driver was broken, so I
> reworked that as well. None of the perf hooks allows locking with
> mutexes or spinlocks, so in perf it's not possible to enable the DFI
> controller when needed. Instead I now unconditionally enable the DFI
> controller during probe when perf is enabled.
> 
> Furthermore the hrtimer I use for reading out the hardware counter
> values before they overflow race with perf. Now a seqlock is used to
> prevent that.
> 
> The RK3588 device tree changes for the DFI were not part of v4. As
> Vincent Legoll showed interest in testing this series the necessary
> device tree changes are now part of this series.

I tested the series on RK3588 EVB1. The read/write byts looks
sensible. Sometimes cycles reads unrealistic values, though:

 Performance counter stats for 'system wide':

18446744070475110400      rockchip_ddr/cycles/                                               
            828.63 MB   rockchip_ddr/read-bytes/                                           
            207.19 MB   rockchip_ddr/read-bytes0/                                          
            207.15 MB   rockchip_ddr/read-bytes1/                                          
            207.14 MB   rockchip_ddr/read-bytes2/                                          
            207.15 MB   rockchip_ddr/read-bytes3/                                          
              1.48 MB   rockchip_ddr/write-bytes/                                          
              0.37 MB   rockchip_ddr/write-bytes0/                                         
              0.37 MB   rockchip_ddr/write-bytes1/                                         
              0.37 MB   rockchip_ddr/write-bytes2/                                         
              0.38 MB   rockchip_ddr/write-bytes3/                                         
            830.12 MB   rockchip_ddr/bytes/                                                

       1.004239766 seconds time elapsed

(This is with memdump running in parallel)

Otherwise the series is

Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com>

-- Sebastian

> Changes since v4:
> - Add device tree changes for RK3588
> - Use seqlock to protect perf counter values from hrtimer
> - Unconditionally enable DFI when perf is enabled
> - Bring back changes to dts/binding patches that were lost in v4
> 
> Changes since v3:
> - Add RK3588 support
> 
> Changes since v2:
> - Fix broken reference to binding
> - Add Reviewed-by from Rob
> 
> Changes since v1:
> - Fix example to actually match the binding and fix the warnings resulted thereof
> - Make addition of rockchip,rk3568-dfi an extra patch
> 
> Sascha Hauer (25):
>   PM / devfreq: rockchip-dfi: Make pmu regmap mandatory
>   PM / devfreq: rockchip-dfi: Embed desc into private data struct
>   PM / devfreq: rockchip-dfi: use consistent name for private data
>     struct
>   PM / devfreq: rockchip-dfi: Add SoC specific init function
>   PM / devfreq: rockchip-dfi: dfi store raw values in counter struct
>   PM / devfreq: rockchip-dfi: Use free running counter
>   PM / devfreq: rockchip-dfi: introduce channel mask
>   PM / devfreq: rk3399_dmc,dfi: generalize DDRTYPE defines
>   PM / devfreq: rockchip-dfi: Clean up DDR type register defines
>   PM / devfreq: rockchip-dfi: Add RK3568 support
>   PM / devfreq: rockchip-dfi: Handle LPDDR2 correctly
>   PM / devfreq: rockchip-dfi: Handle LPDDR4X
>   PM / devfreq: rockchip-dfi: Pass private data struct to internal
>     functions
>   PM / devfreq: rockchip-dfi: Prepare for multiple users
>   PM / devfreq: rockchip-dfi: give variable a better name
>   PM / devfreq: rockchip-dfi: Add perf support
>   PM / devfreq: rockchip-dfi: make register stride SoC specific
>   PM / devfreq: rockchip-dfi: account for multiple DDRMON_CTRL registers
>   PM / devfreq: rockchip-dfi: add support for RK3588
>   dt-bindings: devfreq: event: convert Rockchip DFI binding to yaml
>   dt-bindings: devfreq: event: rockchip,dfi: Add rk3568 support
>   dt-bindings: devfreq: event: rockchip,dfi: Add rk3588 support
>   arm64: dts: rockchip: rk3399: Enable DFI
>   arm64: dts: rockchip: rk356x: Add DFI
>   arm64: dts: rockchip: rk3588s: Add DFI
> 
>  .../bindings/devfreq/event/rockchip,dfi.yaml  |  84 ++
>  .../bindings/devfreq/event/rockchip-dfi.txt   |  18 -
>  .../rockchip,rk3399-dmc.yaml                  |   2 +-
>  arch/arm64/boot/dts/rockchip/rk3399.dtsi      |   1 -
>  arch/arm64/boot/dts/rockchip/rk356x.dtsi      |   7 +
>  arch/arm64/boot/dts/rockchip/rk3588s.dtsi     |  16 +
>  drivers/devfreq/event/rockchip-dfi.c          | 796 +++++++++++++++---
>  drivers/devfreq/rk3399_dmc.c                  |  10 +-
>  include/soc/rockchip/rk3399_grf.h             |   9 +-
>  include/soc/rockchip/rk3568_grf.h             |  13 +
>  include/soc/rockchip/rk3588_grf.h             |  18 +
>  include/soc/rockchip/rockchip_grf.h           |  18 +
>  12 files changed, 854 insertions(+), 138 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/devfreq/event/rockchip,dfi.yaml
>  delete mode 100644 Documentation/devicetree/bindings/devfreq/event/rockchip-dfi.txt
>  create mode 100644 include/soc/rockchip/rk3568_grf.h
>  create mode 100644 include/soc/rockchip/rk3588_grf.h
>  create mode 100644 include/soc/rockchip/rockchip_grf.h
> 
> -- 
> 2.39.2
>
Vincent Legoll June 14, 2023, 2:19 p.m. UTC | #2
Hello Sascha, Sebastian,

On Wed, Jun 14, 2023 at 1:40 PM Sebastian Reichel
<sebastian.reichel@collabora.com> wrote:
> On Wed, May 24, 2023 at 10:31:28AM +0200, Sascha Hauer wrote:
> > This is v5 of the series adding perf support to the rockchip DFI driver.
> > [...]
> > The RK3588 device tree changes for the DFI were not part of v4. As
> > Vincent Legoll showed interest in testing this series the necessary
> > device tree changes are now part of this series.
>
> I tested the series on RK3588 EVB1. The read/write byts looks
> sensible. Sometimes cycles reads unrealistic values, though:
> [...]
> Otherwise the series is
>
> Tested-by: Sebastian Reichel <sebastian.reichel@collabora.com>
>
> -- Sebastian

I also tested this new version of the series on a Pine64 QuartzPro64 dev board.

I applied the series on top of my local branch, which is based on Collabora's
rockchip-3588 plus some QP64 DTS patches, and your V5 patch series.

Looks like this is still working properly:

-bash-5.1# uname -a
Linux qp64 6.4.0-rc1-00140-g658dd2200e2a #24 SMP PREEMPT Wed Jun 14
15:50:34 CEST 2023 aarch64 GNU/Linux

-bash-5.1# zgrep -i _dfi /proc/config.gz
CONFIG_DEVFREQ_EVENT_ROCKCHIP_DFI=y

-bash-5.1# perf list | grep rockchip_ddr
  rockchip_ddr/bytes/                                [Kernel PMU event]
  rockchip_ddr/cycles/                               [Kernel PMU event]
  rockchip_ddr/read-bytes/                           [Kernel PMU event]
  rockchip_ddr/read-bytes0/                          [Kernel PMU event]
  rockchip_ddr/read-bytes1/                          [Kernel PMU event]
  rockchip_ddr/read-bytes2/                          [Kernel PMU event]
  rockchip_ddr/read-bytes3/                          [Kernel PMU event]
  rockchip_ddr/write-bytes/                          [Kernel PMU event]
  rockchip_ddr/write-bytes0/                         [Kernel PMU event]
  rockchip_ddr/write-bytes1/                         [Kernel PMU event]
  rockchip_ddr/write-bytes2/                         [Kernel PMU event]
  rockchip_ddr/write-bytes3/                         [Kernel PMU event]

# With no memory load
-bash-5.1# perf stat -a -e
rockchip_ddr/cycles/,rockchip_ddr/read-bytes/,rockchip_ddr/write-bytes/,rockchip_ddr/bytes/
sleep 1

 Performance counter stats for 'system wide':

        1058691047      rockchip_ddr/cycles/
              9.35 MB   rockchip_ddr/read-bytes/
              0.57 MB   rockchip_ddr/write-bytes/
              9.90 MB   rockchip_ddr/bytes/

       1.002616498 seconds time elapsed

# With a hog
-bash-5.1# memtester 4G > /dev/null 2>&1 &
-bash-5.1# perf stat -a -e
rockchip_ddr/cycles/,rockchip_ddr/read-bytes/,rockchip_ddr/write-bytes/,rockchip_ddr/bytes/
sleep 10

 Performance counter stats for 'system wide':

       10561540038      rockchip_ddr/cycles/
          60212.59 MB   rockchip_ddr/read-bytes/
          31313.03 MB   rockchip_ddr/write-bytes/
          91525.60 MB   rockchip_ddr/bytes/

      10.001651886 seconds time elapsed

You can add my T-B, for the whole series:

Tested-by: Vincent Legoll <vincent.legoll@gmail.com>

Or is there something else you want me to test ?

Thanks for your work
Regards

--
Vincent Legoll
Sebastian Reichel June 14, 2023, 3:27 p.m. UTC | #3
Hi,

On Wed, Jun 14, 2023 at 03:40:34PM +0200, Sebastian Reichel wrote:
> I tested the series on RK3588 EVB1. The read/write byts looks
> sensible. Sometimes cycles reads unrealistic values, though:
> 
> 18446744070475110400      rockchip_ddr/cycles/

I have seen this going off a few times with and without memory
pressure. If it's way off, it always seems to follow the same
pattern: The upper 32 bits are 0xffffffff instead of 0x00000000
with the lower 32 bits containing sensible data.

-- Sebastian
Vincent Legoll June 14, 2023, 7:51 p.m. UTC | #4
Hi,

On Wed, Jun 14, 2023 at 3:27 PM Sebastian Reichel
<sebastian.reichel@collabora.com> wrote:
> On Wed, Jun 14, 2023 at 03:40:34PM +0200, Sebastian Reichel wrote:
> > I tested the series on RK3588 EVB1. The read/write byts looks
> > sensible. Sometimes cycles reads unrealistic values, though:
> >
> > 18446744070475110400      rockchip_ddr/cycles/
>
> I have seen this going off a few times with and without memory
> pressure. If it's way off, it always seems to follow the same
> pattern: The upper 32 bits are 0xffffffff instead of 0x00000000
> with the lower 32 bits containing sensible data.

How often is that happening ?

I have been running this in a loop with varying sleep duration for
the last few hours without hitting it, with and without memtester.

REPEATS=1000
MAX_DURATION=100

OUT="/tmp/perf-dfi-rk3588-${REPEATS}x${MAX_DURATION}s.txt"
echo -n > $OUT

for i in $(seq $REPEATS) ; do
  DURATION=$(shuf -i "0-${MAX_DURATION}" -n 1)
  echo -n "${DURATION} : " >> $OUT
  perf stat -a -e rockchip_ddr/cycles/ sleep $DURATION 2>&1 | awk
'/rockchip_ddr/ {printf("%X\n", int($1))}' >> $OUT
done

BTW, it's on a musl-libc arm64 void linux userspace.
Sebastian Reichel June 14, 2023, 10:18 p.m. UTC | #5
Hi,

On Wed, Jun 14, 2023 at 07:51:21PM +0000, Vincent Legoll wrote:
> On Wed, Jun 14, 2023 at 3:27 PM Sebastian Reichel
> <sebastian.reichel@collabora.com> wrote:
> > On Wed, Jun 14, 2023 at 03:40:34PM +0200, Sebastian Reichel wrote:
> > > I tested the series on RK3588 EVB1. The read/write byts looks
> > > sensible. Sometimes cycles reads unrealistic values, though:
> > >
> > > 18446744070475110400      rockchip_ddr/cycles/
> >
> > I have seen this going off a few times with and without memory
> > pressure. If it's way off, it always seems to follow the same
> > pattern: The upper 32 bits are 0xffffffff instead of 0x00000000
> > with the lower 32 bits containing sensible data.
> 
> How often is that happening ?

I saw it multiple times (4 or 5) within a few tries (I guess around
20). I could see it with and without applying load on the memory.
Tests have been running globally for a second using 'sleep 1' (just
like the example from Sascha Hauer in the perf patch)

> BTW, it's on a musl-libc arm64 void linux userspace.

In my case it's Debian unstable.

-- Sebastian
Sascha Hauer June 15, 2023, 6:56 a.m. UTC | #6
Hi Sebastian,

On Wed, Jun 14, 2023 at 03:40:34PM +0200, Sebastian Reichel wrote:
> Hi,
> 
> On Wed, May 24, 2023 at 10:31:28AM +0200, Sascha Hauer wrote:
> > This is v5 of the series adding perf support to the rockchip DFI driver.
> > 
> > A lot has changed in the perf driver since v4. First of all the review
> > feedback from Robin and Jonathan has been integrated. The perf driver
> > now not only supports monitoring the total DDR utilization, but also the
> > individual channels. I also reworked the way the raw 32bit counter
> > values are summed up to 64bit perf values, so hopefully the code is
> > easier to follow now.
> > 
> > lockdep found out that that locking in the perf driver was broken, so I
> > reworked that as well. None of the perf hooks allows locking with
> > mutexes or spinlocks, so in perf it's not possible to enable the DFI
> > controller when needed. Instead I now unconditionally enable the DFI
> > controller during probe when perf is enabled.
> > 
> > Furthermore the hrtimer I use for reading out the hardware counter
> > values before they overflow race with perf. Now a seqlock is used to
> > prevent that.
> > 
> > The RK3588 device tree changes for the DFI were not part of v4. As
> > Vincent Legoll showed interest in testing this series the necessary
> > device tree changes are now part of this series.
> 
> I tested the series on RK3588 EVB1. The read/write byts looks
> sensible. Sometimes cycles reads unrealistic values, though:
> 
>  Performance counter stats for 'system wide':
> 
> 18446744070475110400      rockchip_ddr/cycles/

I'll have a look later this day. I remember seeing this, but I thought
this had been resolved already.

Thanks for your feedback so far.

Sascha
Sascha Hauer June 15, 2023, 1:27 p.m. UTC | #7
On Wed, Jun 14, 2023 at 03:40:34PM +0200, Sebastian Reichel wrote:
> Hi,
> 
> On Wed, May 24, 2023 at 10:31:28AM +0200, Sascha Hauer wrote:
> > This is v5 of the series adding perf support to the rockchip DFI driver.
> > 
> > A lot has changed in the perf driver since v4. First of all the review
> > feedback from Robin and Jonathan has been integrated. The perf driver
> > now not only supports monitoring the total DDR utilization, but also the
> > individual channels. I also reworked the way the raw 32bit counter
> > values are summed up to 64bit perf values, so hopefully the code is
> > easier to follow now.
> > 
> > lockdep found out that that locking in the perf driver was broken, so I
> > reworked that as well. None of the perf hooks allows locking with
> > mutexes or spinlocks, so in perf it's not possible to enable the DFI
> > controller when needed. Instead I now unconditionally enable the DFI
> > controller during probe when perf is enabled.
> > 
> > Furthermore the hrtimer I use for reading out the hardware counter
> > values before they overflow race with perf. Now a seqlock is used to
> > prevent that.
> > 
> > The RK3588 device tree changes for the DFI were not part of v4. As
> > Vincent Legoll showed interest in testing this series the necessary
> > device tree changes are now part of this series.
> 
> I tested the series on RK3588 EVB1. The read/write byts looks
> sensible. Sometimes cycles reads unrealistic values, though:
> 
>  Performance counter stats for 'system wide':
> 
> 18446744070475110400      rockchip_ddr/cycles/

This goes down to missing initialization of &dfi->last_perf_count, see
my other mail. Will fix this in the next round.

Sascha