diff mbox series

arm64: dts: qcom: x1e80100: Add performance hint for boost clock

Message ID 20241025031257.6284-2-c@jia.je (mailing list archive)
State New
Headers show
Series arm64: dts: qcom: x1e80100: Add performance hint for boost clock | expand

Commit Message

Jiajie Chen Oct. 25, 2024, 3:12 a.m. UTC
The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
core in the second cluster (cores 4-7) and the other in the third
cluster (cores 8-11). However, the scheduler is currently unaware of
this, leading to scenarios where a single core benchmark might run at
3.4 GHz when scheduled to the first cluster.

This patch introduces capacity-dmips-mhz nodes to each CPU node in the
DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
others are set to 1024. This ensures that the two cores can be
prioritized for scheduling. The value 1200 is derived from approximately
`1024/3.4*4.0`.

Note that capacity-dmips-mhz is not ideally suited for this purpose, as
it was designed to differentiate between performance and efficient
cores, not for core boosting. According to its definition, DMIPS/MHz
actually decreases with higher frequencies. However, since the CPU does
not support AMU, and no elegant solution was found, this approach is
used as a workaround.

With this patch, we observe two cores running at full 4.0 GHz without
core binding. The single core score of Geekbench 6 increases from 2452
to 2892, both without core binding. Tested on Surface Laptop 7.

Signed-off-by: Jiajie Chen <c@jia.je>
---
 arch/arm64/boot/dts/qcom/x1e80100.dtsi | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Marc Zyngier Oct. 25, 2024, 7:58 a.m. UTC | #1
On Fri, 25 Oct 2024 04:12:58 +0100,
Jiajie Chen <c@jia.je> wrote:
> 
> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
> core in the second cluster (cores 4-7) and the other in the third
> cluster (cores 8-11). However, the scheduler is currently unaware of
> this, leading to scenarios where a single core benchmark might run at
> 3.4 GHz when scheduled to the first cluster.
> 
> This patch introduces capacity-dmips-mhz nodes to each CPU node in the
> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
> others are set to 1024. This ensures that the two cores can be
> prioritized for scheduling. The value 1200 is derived from approximately
> `1024/3.4*4.0`.
> 
> Note that capacity-dmips-mhz is not ideally suited for this purpose, as
> it was designed to differentiate between performance and efficient
> cores, not for core boosting. According to its definition, DMIPS/MHz
> actually decreases with higher frequencies. However, since the CPU does
> not support AMU, and no elegant solution was found, this approach is
> used as a workaround.

Are you sure?

[    0.570323] CPU features: detected: Activity Monitors Unit (AMU) on CPU0-11

So activity monitors are available. Not that what you have here is not
useful, but this comment seems a bit... surprising.

Thanks,

	M.
Jiajie Chen Oct. 25, 2024, 8:08 a.m. UTC | #2
On 2024/10/25 15:58, Marc Zyngier wrote:
> On Fri, 25 Oct 2024 04:12:58 +0100,
> Jiajie Chen <c@jia.je> wrote:
>> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
>> core in the second cluster (cores 4-7) and the other in the third
>> cluster (cores 8-11). However, the scheduler is currently unaware of
>> this, leading to scenarios where a single core benchmark might run at
>> 3.4 GHz when scheduled to the first cluster.
>>
>> This patch introduces capacity-dmips-mhz nodes to each CPU node in the
>> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
>> others are set to 1024. This ensures that the two cores can be
>> prioritized for scheduling. The value 1200 is derived from approximately
>> `1024/3.4*4.0`.
>>
>> Note that capacity-dmips-mhz is not ideally suited for this purpose, as
>> it was designed to differentiate between performance and efficient
>> cores, not for core boosting. According to its definition, DMIPS/MHz
>> actually decreases with higher frequencies. However, since the CPU does
>> not support AMU, and no elegant solution was found, this approach is
>> used as a workaround.
> Are you sure?
>
> [    0.570323] CPU features: detected: Activity Monitors Unit (AMU) on CPU0-11
>
> So activity monitors are available. Not that what you have here is not
> useful, but this comment seems a bit... surprising.

Sorry for the false claim, I was looking for AMU at /proc/cpuinfo, which 
is not there. But it did not help the scheduling somehow. Let me have a 
look at it.


Best regards,

Jiajie Chen

>
> Thanks,
>
> 	M.
>
Dmitry Baryshkov Oct. 25, 2024, 11:04 a.m. UTC | #3
On Fri, Oct 25, 2024 at 11:12:58AM +0800, Jiajie Chen wrote:
> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one
> core in the second cluster (cores 4-7) and the other in the third
> cluster (cores 8-11). However, the scheduler is currently unaware of
> this, leading to scenarios where a single core benchmark might run at
> 3.4 GHz when scheduled to the first cluster.
> 
> This patch introduces capacity-dmips-mhz nodes to each CPU node in the
> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while
> others are set to 1024. This ensures that the two cores can be
> prioritized for scheduling. The value 1200 is derived from approximately
> `1024/3.4*4.0`.
> 
> Note that capacity-dmips-mhz is not ideally suited for this purpose, as
> it was designed to differentiate between performance and efficient
> cores, not for core boosting. According to its definition, DMIPS/MHz
> actually decreases with higher frequencies. However, since the CPU does
> not support AMU, and no elegant solution was found, this approach is
> used as a workaround.
> 
> With this patch, we observe two cores running at full 4.0 GHz without
> core binding. The single core score of Geekbench 6 increases from 2452
> to 2892, both without core binding. Tested on Surface Laptop 7.

I think this is a nice hack, but I'd prefer to see scheduler being
improved instead. From my (ignorant) point of view this should be close
to SMT-based scheduling. We should split the jobs between the clusters,
if that provides better power utilisation.

> 
> Signed-off-by: Jiajie Chen <c@jia.je>
> ---
>  arch/arm64/boot/dts/qcom/x1e80100.dtsi | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
index cd732ef88cd8..c9c559d956c2 100644
--- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi
+++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi
@@ -69,6 +69,7 @@  CPU0: cpu@0 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x0>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_0>;
 			power-domains = <&CPU_PD0>;
 			power-domain-names = "psci";
@@ -86,6 +87,7 @@  CPU1: cpu@100 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x100>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_0>;
 			power-domains = <&CPU_PD1>;
 			power-domain-names = "psci";
@@ -97,6 +99,7 @@  CPU2: cpu@200 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x200>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_0>;
 			power-domains = <&CPU_PD2>;
 			power-domain-names = "psci";
@@ -108,6 +111,7 @@  CPU3: cpu@300 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x300>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_0>;
 			power-domains = <&CPU_PD3>;
 			power-domain-names = "psci";
@@ -119,6 +123,7 @@  CPU4: cpu@10000 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x10000>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1200>;
 			next-level-cache = <&L2_1>;
 			power-domains = <&CPU_PD4>;
 			power-domain-names = "psci";
@@ -136,6 +141,7 @@  CPU5: cpu@10100 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x10100>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_1>;
 			power-domains = <&CPU_PD5>;
 			power-domain-names = "psci";
@@ -147,6 +153,7 @@  CPU6: cpu@10200 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x10200>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_1>;
 			power-domains = <&CPU_PD6>;
 			power-domain-names = "psci";
@@ -158,6 +165,7 @@  CPU7: cpu@10300 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x10300>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_1>;
 			power-domains = <&CPU_PD7>;
 			power-domain-names = "psci";
@@ -169,6 +177,7 @@  CPU8: cpu@20000 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x20000>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1200>;
 			next-level-cache = <&L2_2>;
 			power-domains = <&CPU_PD8>;
 			power-domain-names = "psci";
@@ -186,6 +195,7 @@  CPU9: cpu@20100 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x20100>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_2>;
 			power-domains = <&CPU_PD9>;
 			power-domain-names = "psci";
@@ -197,6 +207,7 @@  CPU10: cpu@20200 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x20200>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_2>;
 			power-domains = <&CPU_PD10>;
 			power-domain-names = "psci";
@@ -208,6 +219,7 @@  CPU11: cpu@20300 {
 			compatible = "qcom,oryon";
 			reg = <0x0 0x20300>;
 			enable-method = "psci";
+			capacity-dmips-mhz = <1024>;
 			next-level-cache = <&L2_2>;
 			power-domains = <&CPU_PD11>;
 			power-domain-names = "psci";