diff mbox series

arm64: dts: allwinner: Add cache information to the SoC dtsi for A64

Message ID 6a772756c2c677dbdaaab4a2c71a358d8e4b27e9.1714304058.git.dsimic@manjaro.org (mailing list archive)
State New
Headers show
Series arm64: dts: allwinner: Add cache information to the SoC dtsi for A64 | expand

Commit Message

Dragan Simic April 28, 2024, 11:40 a.m. UTC
Add missing cache information to the Allwinner A64 SoC dtsi, to allow
the userspace, which includes lscpu(1) that uses the virtual files provided
by the kernel under the /sys/devices/system/cpu directory, to display the
proper A64 cache information.

While there, use a more self-descriptive label for the L2 cache node, which
also makes it more consistent with other SoC dtsi files.

The cache parameters for the A64 dtsi were obtained and partially derived
by hand from the cache size and layout specifications found in the following
datasheets and technical reference manuals:

  - Allwinner A64 datasheet, version 1.1
  - ARM Cortex-A53 revision r0p3 TRM, version E

For future reference, here's a brief summary of the documentation:

  - All caches employ the 64-byte cache line length
  - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
    cache and 32 KB of L1 4-way, set-associative data cache
  - The entire SoC has 512 KB of unified L2 16-way, set-associative cache

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---
 arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 ++++++++++++++++---
 1 file changed, 32 insertions(+), 5 deletions(-)

Comments

Jernej Škrabec April 28, 2024, 4:19 p.m. UTC | #1
Dne nedelja, 28. april 2024 ob 13:40:35 GMT +2 je Dragan Simic napisal(a):
> Add missing cache information to the Allwinner A64 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper A64 cache information.
> 
> While there, use a more self-descriptive label for the L2 cache node, which
> also makes it more consistent with other SoC dtsi files.
> 
> The cache parameters for the A64 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
> 
>   - Allwinner A64 datasheet, version 1.1
>   - ARM Cortex-A53 revision r0p3 TRM, version E
> 
> For future reference, here's a brief summary of the documentation:
> 
>   - All caches employ the 64-byte cache line length
>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
>     cache and 32 KB of L1 4-way, set-associative data cache
>   - The entire SoC has 512 KB of unified L2 16-way, set-associative cache
> 
> Signed-off-by: Dragan Simic <dsimic@manjaro.org>

Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>

Best regards,
Jernej
Andre Przywara April 29, 2024, 10:33 a.m. UTC | #2
On Sun, 28 Apr 2024 13:40:35 +0200
Dragan Simic <dsimic@manjaro.org> wrote:

Hi,

thanks for taking care of this!

> Add missing cache information to the Allwinner A64 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper A64 cache information.
> 
> While there, use a more self-descriptive label for the L2 cache node, which
> also makes it more consistent with other SoC dtsi files.
> 
> The cache parameters for the A64 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
> 
>   - Allwinner A64 datasheet, version 1.1
>   - ARM Cortex-A53 revision r0p3 TRM, version E
> 
> For future reference, here's a brief summary of the documentation:
> 
>   - All caches employ the 64-byte cache line length
>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
>     cache and 32 KB of L1 4-way, set-associative data cache
>   - The entire SoC has 512 KB of unified L2 16-way, set-associative cache

So that looks correct when checking the manuals, and the per-CPU
entries below match both between themselves and with that description
above.
However I have some level of distrust towards the Allwinner manuals,
regarding the cache sizes (which are chosen by Allwinner).
So while I haven't measured this myself, nor checked the cache type
registers, tinymembench's memory latency test supports those sizes are
correct:
https://github.com/ssvb/tinymembench/wiki/PINE64-(Allwinner-A64)

> Signed-off-by: Dragan Simic <dsimic@manjaro.org>

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Cheers,
Andre

> ---
>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 ++++++++++++++++---
>  1 file changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> index 57ac18738c99..86074d03afa9 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
> @@ -51,49 +51,76 @@ cpu0: cpu@0 {
>  			device_type = "cpu";
>  			reg = <0>;
>  			enable-method = "psci";
> -			next-level-cache = <&L2>;
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-names = "cpu";
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu1: cpu@1 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <1>;
>  			enable-method = "psci";
> -			next-level-cache = <&L2>;
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-names = "cpu";
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu2: cpu@2 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <2>;
>  			enable-method = "psci";
> -			next-level-cache = <&L2>;
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-names = "cpu";
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu3: cpu@3 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <3>;
>  			enable-method = "psci";
> -			next-level-cache = <&L2>;
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-names = "cpu";
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
> -		L2: l2-cache {
> +		l2_cache: l2-cache {
>  			compatible = "cache";
>  			cache-level = <2>;
>  			cache-unified;
> +			cache-size = <0x80000>;
> +			cache-line-size = <64>;
> +			cache-sets = <512>;
>  		};
>  	};
>  
>
Dragan Simic April 29, 2024, 1:51 p.m. UTC | #3
Hello Andre,

On 2024-04-29 12:33, Andre Przywara wrote:
> On Sun, 28 Apr 2024 13:40:35 +0200
> Dragan Simic <dsimic@manjaro.org> wrote:
> thanks for taking care of this!

Thank you for reviewing my patch!

>> Add missing cache information to the Allwinner A64 SoC dtsi, to allow
>> the userspace, which includes lscpu(1) that uses the virtual files 
>> provided
>> by the kernel under the /sys/devices/system/cpu directory, to display 
>> the
>> proper A64 cache information.
>> 
>> While there, use a more self-descriptive label for the L2 cache node, 
>> which
>> also makes it more consistent with other SoC dtsi files.
>> 
>> The cache parameters for the A64 dtsi were obtained and partially 
>> derived
>> by hand from the cache size and layout specifications found in the 
>> following
>> datasheets and technical reference manuals:
>> 
>>   - Allwinner A64 datasheet, version 1.1
>>   - ARM Cortex-A53 revision r0p3 TRM, version E
>> 
>> For future reference, here's a brief summary of the documentation:
>> 
>>   - All caches employ the 64-byte cache line length
>>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative 
>> instruction
>>     cache and 32 KB of L1 4-way, set-associative data cache
>>   - The entire SoC has 512 KB of unified L2 16-way, set-associative 
>> cache
> 
> So that looks correct when checking the manuals, and the per-CPU
> entries below match both between themselves and with that description
> above.
> However I have some level of distrust towards the Allwinner manuals,
> regarding the cache sizes (which are chosen by Allwinner).

Quite frankly, I was surprised a bit to see that the A64 contains
512 KB of L2 cache.  IMHO, that's quite a lot for an SoC that was
advertised primarily as a cost-effective solution.

> So while I haven't measured this myself, nor checked the cache type
> registers, tinymembench's memory latency test supports those sizes are
> correct:
> https://github.com/ssvb/tinymembench/wiki/PINE64-(Allwinner-A64)

Ah, that's a nice benchmark report.  Let me copy & paste the most
relevant part of that report below, just for future reference in
case that web page becomes inaccessible at some point:

==========================================================================
== Memory latency test                                                  
==
==                                                                      
==
== Average time is measured for random memory accesses in the buffers   
==
== of different sizes. The larger is the buffer, the more significant   
==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      
==
== accesses. For extremely large buffer sizes we are expecting to see   
==
== page table walk with several requests to SDRAM for almost every      
==
== memory access (though 64MiB is not nearly large enough to experience 
==
== this effect to its fullest).                                         
==
==                                                                      
==
== Note 1: All the numbers are representing extra time, which needs to  
==
==         be added to L1 cache latency. The cycle timings for L1 cache 
==
==         latency can be usually found in the processor documentation. 
==
== Note 2: Dual random read means that we are simultaneously performing 
==
==         two independent memory accesses at a time. In the case if    
==
==         the memory subsystem can't handle multiple outstanding       
==
==         requests, dual random read has the same timings as two       
==
==         single reads performed one after another.                    
==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
       1024 :    0.0 ns          /     0.0 ns
       2048 :    0.0 ns          /     0.0 ns
       4096 :    0.0 ns          /     0.0 ns
       8192 :    0.0 ns          /     0.0 ns
      16384 :    0.0 ns          /     0.0 ns
      32768 :    0.0 ns          /     0.0 ns
      65536 :    5.9 ns          /    10.0 ns
     131072 :    9.1 ns          /    14.0 ns
     262144 :   10.7 ns          /    15.5 ns
     524288 :   12.7 ns          /    17.7 ns
    1048576 :   92.8 ns          /   143.2 ns
    2097152 :  134.9 ns          /   184.4 ns
    4194304 :  163.5 ns          /   207.1 ns
    8388608 :  178.6 ns          /   217.6 ns
   16777216 :  187.5 ns          /   223.7 ns
   33554432 :  192.8 ns          /   228.0 ns
   67108864 :  195.8 ns          /   230.7 ns

block size : single random read / dual random read, [MADV_HUGEPAGE]
       1024 :    0.0 ns          /     0.0 ns
       2048 :    0.0 ns          /     0.0 ns
       4096 :    0.0 ns          /     0.0 ns
       8192 :    0.0 ns          /     0.0 ns
      16384 :    0.0 ns          /     0.0 ns
      32768 :    0.0 ns          /     0.0 ns
      65536 :    5.9 ns          /    10.0 ns
     131072 :    9.1 ns          /    14.0 ns
     262144 :   10.7 ns          /    15.6 ns
     524288 :   12.6 ns          /    17.8 ns
    1048576 :   92.7 ns          /   142.6 ns
    2097152 :  134.7 ns          /   184.3 ns
    4194304 :  155.8 ns          /   198.4 ns
    8388608 :  166.4 ns          /   203.8 ns
   16777216 :  171.6 ns          /   206.0 ns
   33554432 :  174.2 ns          /   206.9 ns
   67108864 :  175.4 ns          /   207.4 ns

>> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
> 
> Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks!

>> ---
>>  arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi | 37 
>> ++++++++++++++++---
>>  1 file changed, 32 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi 
>> b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> index 57ac18738c99..86074d03afa9 100644
>> --- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
>> @@ -51,49 +51,76 @@ cpu0: cpu@0 {
>>  			device_type = "cpu";
>>  			reg = <0>;
>>  			enable-method = "psci";
>> -			next-level-cache = <&L2>;
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-names = "cpu";
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu1: cpu@1 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <1>;
>>  			enable-method = "psci";
>> -			next-level-cache = <&L2>;
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-names = "cpu";
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu2: cpu@2 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <2>;
>>  			enable-method = "psci";
>> -			next-level-cache = <&L2>;
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-names = "cpu";
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu3: cpu@3 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <3>;
>>  			enable-method = "psci";
>> -			next-level-cache = <&L2>;
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-names = "cpu";
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>> -		L2: l2-cache {
>> +		l2_cache: l2-cache {
>>  			compatible = "cache";
>>  			cache-level = <2>;
>>  			cache-unified;
>> +			cache-size = <0x80000>;
>> +			cache-line-size = <64>;
>> +			cache-sets = <512>;
>>  		};
>>  	};
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
index 57ac18738c99..86074d03afa9 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a64.dtsi
@@ -51,49 +51,76 @@  cpu0: cpu@0 {
 			device_type = "cpu";
 			reg = <0>;
 			enable-method = "psci";
-			next-level-cache = <&L2>;
 			clocks = <&ccu CLK_CPUX>;
 			clock-names = "cpu";
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
 		cpu1: cpu@1 {
 			compatible = "arm,cortex-a53";
 			device_type = "cpu";
 			reg = <1>;
 			enable-method = "psci";
-			next-level-cache = <&L2>;
 			clocks = <&ccu CLK_CPUX>;
 			clock-names = "cpu";
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
 		cpu2: cpu@2 {
 			compatible = "arm,cortex-a53";
 			device_type = "cpu";
 			reg = <2>;
 			enable-method = "psci";
-			next-level-cache = <&L2>;
 			clocks = <&ccu CLK_CPUX>;
 			clock-names = "cpu";
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
 		cpu3: cpu@3 {
 			compatible = "arm,cortex-a53";
 			device_type = "cpu";
 			reg = <3>;
 			enable-method = "psci";
-			next-level-cache = <&L2>;
 			clocks = <&ccu CLK_CPUX>;
 			clock-names = "cpu";
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
-		L2: l2-cache {
+		l2_cache: l2-cache {
 			compatible = "cache";
 			cache-level = <2>;
 			cache-unified;
+			cache-size = <0x80000>;
+			cache-line-size = <64>;
+			cache-sets = <512>;
 		};
 	};