diff mbox

[2/4] arm: dts: exynos: add exynos5420 cpu capacity-dmips-mhz information

Message ID 20170830144120.9312-3-dietmar.eggemann@arm.com (mailing list archive)
State Not Applicable
Delegated to: Simon Horman
Headers show

Commit Message

Dietmar Eggemann Aug. 30, 2017, 2:41 p.m. UTC
The following 'capacity-dmips-mhz' dt property values are used:

Cortex-A15: 1024, Cortex-A7: 539

They have been derived from the cpu_efficiency values:

Cortex-A15: 3891, Cortex-A7: 2048

by scaling them so that the Cortex-A15s (big cores) use 1024.

The cpu_efficiency values were originally derived from the "Big.LITTLE
Processing with ARM Cortex™-A15 & Cortex-A7" white paper
(http://www.cl.cam.ac.uk/~rdm34/big.LITTLE.pdf). Table 1 lists 1.9x
(3891/2048) as the Cortex-A15 vs Cortex-A7 performance ratio for the
Dhrystone benchmark.

The following platforms are affected once cpu-invariant accounting
support is re-connected to the task scheduler:

arndale-octa, peach-pi, peach-pit, smdk5420

The patch has been tested on Samsung Chromebook 2 13" (peach-pi, Exynos
5800).

$ cat /sys/devices/system/cpu/cpu*/cpu_capacity
1024
1024
1024
1024
389
389
389
389

The Cortex-A15 vs Cortex-A7 performance ratio is 1024/389 = 2.63.

The values derived with the 'cpu_efficiency/clock-frequency dt property'
solution are:

$ cat /sys/devices/system/cpu/cpu*/cpu_capacity
1535
1535
1535
1535
448
448
448
448

The Cortex-A15 vs Cortex-A7 performance ratio is 1535/448 = 3.43.

The discrepancy between 2.63 and 3.43 is due to the false assumption
when using the 'cpu_efficiency/clock-frequency dt property' solution
that the max cpu frequency of the little cpus is 1 GHZ and not 1.3 GHz.
The Cortex-A7 cluster runs with a max cpu frequency of 1.3 GHZ whereas
the 'clock-frequency' property value is set to 1 GHz.

3.43/1.3 = 2.64

$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
1800000
1800000
1800000
1800000
1300000 <-- max cpu frequency of the Cortex-A7s (little cores)
1300000
1300000
1300000

Running another benchmark (single-threaded sysbench affine to the
individual cpus) with performance cpufreq governor on the Samsung
Chromebook 2 13" showed the following numbers:

$ for i in `seq 0 7`; do taskset -c $i sysbench --test=cpu
  --num-threads=1 --max-time=10 run | grep "total number of events:";
  done

total number of events: 1083
total number of events: 1085
total number of events: 1085
total number of events: 1085
total number of events: 454
total number of events: 454
total number of events: 454
total number of events: 454

The Cortex-A15 vs Cortex-A7 performance ratio is 2.39, i.e. very close
to the one derived from the Dhrystone based one of the "Big.LITTLE
Processing with ARM Cortex™-A15 & Cortex-A7" white paper (2.63).

We don't aim for exact values for the cpu capacity values. Besides the
CPI (Cycles Per Instruction), the instruction mix and whether the system
runs cpu-bound or memory-bound has an impact on the cpu capacity values
derived from these benchmark results.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
---
 arch/arm/boot/dts/exynos5420-cpus.dtsi | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Krzysztof Kozlowski Aug. 30, 2017, 8:26 p.m. UTC | #1
On Wed, Aug 30, 2017 at 03:41:18PM +0100, Dietmar Eggemann wrote:
> The following 'capacity-dmips-mhz' dt property values are used:
> 
> Cortex-A15: 1024, Cortex-A7: 539
> 
> They have been derived from the cpu_efficiency values:
> 
> Cortex-A15: 3891, Cortex-A7: 2048
> 
> by scaling them so that the Cortex-A15s (big cores) use 1024.
> 
> The cpu_efficiency values were originally derived from the "Big.LITTLE
> Processing with ARM Cortex™-A15 & Cortex-A7" white paper
> (http://www.cl.cam.ac.uk/~rdm34/big.LITTLE.pdf). Table 1 lists 1.9x
> (3891/2048) as the Cortex-A15 vs Cortex-A7 performance ratio for the
> Dhrystone benchmark.
> 
> The following platforms are affected once cpu-invariant accounting
> support is re-connected to the task scheduler:
> 
> arndale-octa, peach-pi, peach-pit, smdk5420
> 
> The patch has been tested on Samsung Chromebook 2 13" (peach-pi, Exynos
> 5800).
> 
> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 1024
> 1024
> 1024
> 1024
> 389
> 389
> 389
> 389

I am missing something... shouldn't this be 539? Or is it scaled with
the clock-frequency (1 GHz) value?


Best regards,
Krzysztof


> 
> The Cortex-A15 vs Cortex-A7 performance ratio is 1024/389 = 2.63.
> 
> The values derived with the 'cpu_efficiency/clock-frequency dt property'
> solution are:
> 
> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 1535
> 1535
> 1535
> 1535
> 448
> 448
> 448
> 448
> 
> The Cortex-A15 vs Cortex-A7 performance ratio is 1535/448 = 3.43.
> 
> The discrepancy between 2.63 and 3.43 is due to the false assumption
> when using the 'cpu_efficiency/clock-frequency dt property' solution
> that the max cpu frequency of the little cpus is 1 GHZ and not 1.3 GHz.
> The Cortex-A7 cluster runs with a max cpu frequency of 1.3 GHZ whereas
> the 'clock-frequency' property value is set to 1 GHz.
> 
> 3.43/1.3 = 2.64
> 
> $ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
> 1800000
> 1800000
> 1800000
> 1800000
> 1300000 <-- max cpu frequency of the Cortex-A7s (little cores)
> 1300000
> 1300000
> 1300000
> 
> Running another benchmark (single-threaded sysbench affine to the
> individual cpus) with performance cpufreq governor on the Samsung
> Chromebook 2 13" showed the following numbers:
> 
> $ for i in `seq 0 7`; do taskset -c $i sysbench --test=cpu
>   --num-threads=1 --max-time=10 run | grep "total number of events:";
>   done
> 
> total number of events: 1083
> total number of events: 1085
> total number of events: 1085
> total number of events: 1085
> total number of events: 454
> total number of events: 454
> total number of events: 454
> total number of events: 454
> 
> The Cortex-A15 vs Cortex-A7 performance ratio is 2.39, i.e. very close
> to the one derived from the Dhrystone based one of the "Big.LITTLE
> Processing with ARM Cortex™-A15 & Cortex-A7" white paper (2.63).
> 
> We don't aim for exact values for the cpu capacity values. Besides the
> CPI (Cycles Per Instruction), the instruction mix and whether the system
> runs cpu-bound or memory-bound has an impact on the cpu capacity values
> derived from these benchmark results.
> 
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  arch/arm/boot/dts/exynos5420-cpus.dtsi | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/exynos5420-cpus.dtsi b/arch/arm/boot/dts/exynos5420-cpus.dtsi
> index 5c052d7ff554..d7d703aa1699 100644
> --- a/arch/arm/boot/dts/exynos5420-cpus.dtsi
> +++ b/arch/arm/boot/dts/exynos5420-cpus.dtsi
> @@ -36,6 +36,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <11>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <1024>;
>  		};
>  
>  		cpu1: cpu@1 {
> @@ -48,6 +49,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <11>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <1024>;
>  		};
>  
>  		cpu2: cpu@2 {
> @@ -60,6 +62,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <11>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <1024>;
>  		};
>  
>  		cpu3: cpu@3 {
> @@ -72,6 +75,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <11>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <1024>;
>  		};
>  
>  		cpu4: cpu@100 {
> @@ -85,6 +89,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <7>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <539>;
>  		};
>  
>  		cpu5: cpu@101 {
> @@ -97,6 +102,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <7>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <539>;
>  		};
>  
>  		cpu6: cpu@102 {
> @@ -109,6 +115,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <7>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <539>;
>  		};
>  
>  		cpu7: cpu@103 {
> @@ -121,6 +128,7 @@
>  			cooling-min-level = <0>;
>  			cooling-max-level = <7>;
>  			#cooling-cells = <2>; /* min followed by max */
> +			capacity-dmips-mhz = <539>;
>  		};
>  	};
>  };
> -- 
> 2.11.0
>
Dietmar Eggemann Aug. 31, 2017, 10:36 a.m. UTC | #2
On 30/08/17 21:26, Krzysztof Kozlowski wrote:
> On Wed, Aug 30, 2017 at 03:41:18PM +0100, Dietmar Eggemann wrote:
>> The following 'capacity-dmips-mhz' dt property values are used:
>>
>> Cortex-A15: 1024, Cortex-A7: 539
>>
>> They have been derived from the cpu_efficiency values:
>>
>> Cortex-A15: 3891, Cortex-A7: 2048
>>
>> by scaling them so that the Cortex-A15s (big cores) use 1024.
>>
>> The cpu_efficiency values were originally derived from the "Big.LITTLE
>> Processing with ARM Cortex™-A15 & Cortex-A7" white paper
>> (http://www.cl.cam.ac.uk/~rdm34/big.LITTLE.pdf). Table 1 lists 1.9x
>> (3891/2048) as the Cortex-A15 vs Cortex-A7 performance ratio for the
>> Dhrystone benchmark.
>>
>> The following platforms are affected once cpu-invariant accounting
>> support is re-connected to the task scheduler:
>>
>> arndale-octa, peach-pi, peach-pit, smdk5420
>>
>> The patch has been tested on Samsung Chromebook 2 13" (peach-pi, Exynos
>> 5800).
>>
>> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
>> 1024
>> 1024
>> 1024
>> 1024
>> 389
>> 389
>> 389
>> 389
> 
> I am missing something... shouldn't this be 539? Or is it scaled with
> the clock-frequency (1 GHz) value?

Yeah, the capacity-dmips-mhz dt value of 539 for the little cpus is
scaled by 1.3/1.8 (max cpu capacity/ system wide max cpu capacity):

539 * 1.3/1.8 = 389

This max cpu capacity scaling is part of both solutions, the 'cpu
capacity-dmips-mhz' and the 'cpu_efficiency/clock-frequency dt property'
one.

The (original*) cpu capacity on a heterogeneous platform expresses uArch
and max cpu frequency differences between the (logical) cpus of the
system.

* not further reduced by rt and/or irq pressure.

[...]
Krzysztof Kozlowski Sept. 3, 2017, 7:56 p.m. UTC | #3
On Thu, Aug 31, 2017 at 11:36:07AM +0100, Dietmar Eggemann wrote:
> On 30/08/17 21:26, Krzysztof Kozlowski wrote:
> > On Wed, Aug 30, 2017 at 03:41:18PM +0100, Dietmar Eggemann wrote:
> >> The following 'capacity-dmips-mhz' dt property values are used:
> >>
> >> Cortex-A15: 1024, Cortex-A7: 539
> >>
> >> They have been derived from the cpu_efficiency values:
> >>
> >> Cortex-A15: 3891, Cortex-A7: 2048
> >>
> >> by scaling them so that the Cortex-A15s (big cores) use 1024.
> >>
> >> The cpu_efficiency values were originally derived from the "Big.LITTLE
> >> Processing with ARM Cortex™-A15 & Cortex-A7" white paper
> >> (http://www.cl.cam.ac.uk/~rdm34/big.LITTLE.pdf). Table 1 lists 1.9x
> >> (3891/2048) as the Cortex-A15 vs Cortex-A7 performance ratio for the
> >> Dhrystone benchmark.
> >>
> >> The following platforms are affected once cpu-invariant accounting
> >> support is re-connected to the task scheduler:
> >>
> >> arndale-octa, peach-pi, peach-pit, smdk5420
> >>
> >> The patch has been tested on Samsung Chromebook 2 13" (peach-pi, Exynos
> >> 5800).
> >>
> >> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
> >> 1024
> >> 1024
> >> 1024
> >> 1024
> >> 389
> >> 389
> >> 389
> >> 389
> > 
> > I am missing something... shouldn't this be 539? Or is it scaled with
> > the clock-frequency (1 GHz) value?
> 
> Yeah, the capacity-dmips-mhz dt value of 539 for the little cpus is
> scaled by 1.3/1.8 (max cpu capacity/ system wide max cpu capacity):
> 
> 539 * 1.3/1.8 = 389
> 
> This max cpu capacity scaling is part of both solutions, the 'cpu
> capacity-dmips-mhz' and the 'cpu_efficiency/clock-frequency dt property'
> one.
> 
> The (original*) cpu capacity on a heterogeneous platform expresses uArch
> and max cpu frequency differences between the (logical) cpus of the
> system.
> 
> * not further reduced by rt and/or irq pressure.
> 
> [...]

Thanks for explanation, looks fine for me. I'll take it after merge
window.

Best regards,
Krzysztof
Dietmar Eggemann Sept. 6, 2017, 11:47 a.m. UTC | #4
On 03/09/17 20:56, Krzysztof Kozlowski wrote:
> On Thu, Aug 31, 2017 at 11:36:07AM +0100, Dietmar Eggemann wrote:
>> On 30/08/17 21:26, Krzysztof Kozlowski wrote:
>>> On Wed, Aug 30, 2017 at 03:41:18PM +0100, Dietmar Eggemann wrote:

[...]

>>>> The patch has been tested on Samsung Chromebook 2 13" (peach-pi, Exynos
>>>> 5800).
>>>>
>>>> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
>>>> 1024
>>>> 1024
>>>> 1024
>>>> 1024
>>>> 389
>>>> 389
>>>> 389
>>>> 389
>>>
>>> I am missing something... shouldn't this be 539? Or is it scaled with
>>> the clock-frequency (1 GHz) value?
>>
>> Yeah, the capacity-dmips-mhz dt value of 539 for the little cpus is
>> scaled by 1.3/1.8 (max cpu capacity/ system wide max cpu capacity):
>>
>> 539 * 1.3/1.8 = 389
>>
>> This max cpu capacity scaling is part of both solutions, the 'cpu
>> capacity-dmips-mhz' and the 'cpu_efficiency/clock-frequency dt property'
>> one.
>>
>> The (original*) cpu capacity on a heterogeneous platform expresses uArch
>> and max cpu frequency differences between the (logical) cpus of the
>> system.
>>
>> * not further reduced by rt and/or irq pressure.
>>
>> [...]
> 
> Thanks for explanation, looks fine for me. I'll take it after merge
> window.

Nice, since the 'cpu capacity-dmips-mhz' is already supported for arm
(and used by TC2 (vexpress-v2p-ca15_a7.dts)) this can be done
independently of the actual removal of the
'cpu_efficiency/clock-frequency dt property' solution in patch 1/4.

[..]
Krzysztof Kozlowski Sept. 17, 2017, 7:37 a.m. UTC | #5
On Wed, Aug 30, 2017 at 03:41:18PM +0100, Dietmar Eggemann wrote:
> The following 'capacity-dmips-mhz' dt property values are used:
> 
> Cortex-A15: 1024, Cortex-A7: 539
> 
> They have been derived from the cpu_efficiency values:
> 
> Cortex-A15: 3891, Cortex-A7: 2048
> 
> by scaling them so that the Cortex-A15s (big cores) use 1024.
> 
> The cpu_efficiency values were originally derived from the "Big.LITTLE
> Processing with ARM Cortex™-A15 & Cortex-A7" white paper
> (http://www.cl.cam.ac.uk/~rdm34/big.LITTLE.pdf). Table 1 lists 1.9x
> (3891/2048) as the Cortex-A15 vs Cortex-A7 performance ratio for the
> Dhrystone benchmark.
> 
> The following platforms are affected once cpu-invariant accounting
> support is re-connected to the task scheduler:
> 
> arndale-octa, peach-pi, peach-pit, smdk5420
> 
> The patch has been tested on Samsung Chromebook 2 13" (peach-pi, Exynos
> 5800).
> 
> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 1024
> 1024
> 1024
> 1024
> 389
> 389
> 389
> 389
> 
> The Cortex-A15 vs Cortex-A7 performance ratio is 1024/389 = 2.63.
> 
> The values derived with the 'cpu_efficiency/clock-frequency dt property'
> solution are:
> 
> $ cat /sys/devices/system/cpu/cpu*/cpu_capacity
> 1535
> 1535
> 1535
> 1535
> 448
> 448
> 448
> 448
> 
> The Cortex-A15 vs Cortex-A7 performance ratio is 1535/448 = 3.43.
> 
> The discrepancy between 2.63 and 3.43 is due to the false assumption
> when using the 'cpu_efficiency/clock-frequency dt property' solution
> that the max cpu frequency of the little cpus is 1 GHZ and not 1.3 GHz.
> The Cortex-A7 cluster runs with a max cpu frequency of 1.3 GHZ whereas
> the 'clock-frequency' property value is set to 1 GHz.
> 
> 3.43/1.3 = 2.64
> 
> $ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
> 1800000
> 1800000
> 1800000
> 1800000
> 1300000 <-- max cpu frequency of the Cortex-A7s (little cores)
> 1300000
> 1300000
> 1300000
> 
> Running another benchmark (single-threaded sysbench affine to the
> individual cpus) with performance cpufreq governor on the Samsung
> Chromebook 2 13" showed the following numbers:
> 
> $ for i in `seq 0 7`; do taskset -c $i sysbench --test=cpu
>   --num-threads=1 --max-time=10 run | grep "total number of events:";
>   done
> 
> total number of events: 1083
> total number of events: 1085
> total number of events: 1085
> total number of events: 1085
> total number of events: 454
> total number of events: 454
> total number of events: 454
> total number of events: 454
> 
> The Cortex-A15 vs Cortex-A7 performance ratio is 2.39, i.e. very close
> to the one derived from the Dhrystone based one of the "Big.LITTLE
> Processing with ARM Cortex™-A15 & Cortex-A7" white paper (2.63).
> 
> We don't aim for exact values for the cpu capacity values. Besides the
> CPI (Cycles Per Instruction), the instruction mix and whether the system
> runs cpu-bound or memory-bound has an impact on the cpu capacity values
> derived from these benchmark results.
> 
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: Kukjin Kim <kgene@kernel.org>
> Cc: Krzysztof Kozlowski <krzk@kernel.org>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
> ---
>  arch/arm/boot/dts/exynos5420-cpus.dtsi | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 

Thanks, applied (with s/arm/ARM/ change in subject).

Best regards,
Krzysztof
diff mbox

Patch

diff --git a/arch/arm/boot/dts/exynos5420-cpus.dtsi b/arch/arm/boot/dts/exynos5420-cpus.dtsi
index 5c052d7ff554..d7d703aa1699 100644
--- a/arch/arm/boot/dts/exynos5420-cpus.dtsi
+++ b/arch/arm/boot/dts/exynos5420-cpus.dtsi
@@ -36,6 +36,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <11>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <1024>;
 		};
 
 		cpu1: cpu@1 {
@@ -48,6 +49,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <11>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <1024>;
 		};
 
 		cpu2: cpu@2 {
@@ -60,6 +62,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <11>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <1024>;
 		};
 
 		cpu3: cpu@3 {
@@ -72,6 +75,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <11>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <1024>;
 		};
 
 		cpu4: cpu@100 {
@@ -85,6 +89,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <7>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <539>;
 		};
 
 		cpu5: cpu@101 {
@@ -97,6 +102,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <7>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <539>;
 		};
 
 		cpu6: cpu@102 {
@@ -109,6 +115,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <7>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <539>;
 		};
 
 		cpu7: cpu@103 {
@@ -121,6 +128,7 @@ 
 			cooling-min-level = <0>;
 			cooling-max-level = <7>;
 			#cooling-cells = <2>; /* min followed by max */
+			capacity-dmips-mhz = <539>;
 		};
 	};
 };