Message ID | 20240429070322.999500-2-Xiaojian.Du@amd.com (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mario Limonciello |
Headers | show |
Series | [1/2] x86/cpufeatures: Add AMD FAST CPPC feature flag | expand |
[AMD Official Use Only - General] > -----Original Message----- > From: Du, Xiaojian <Xiaojian.Du@amd.com> > Sent: Monday, April 29, 2024 3:03 PM > To: linux-kernel@vger.kernel.org; linux-pm@vger.kernel.org > Cc: tglx@linutronix.de; mingo@redhat.com; bp@alien8.de; > dave.hansen@linux.intel.com; hpa@zytor.com; > daniel.sneddon@linux.intel.com; jpoimboe@kernel.org; > pawan.kumar.gupta@linux.intel.com; Das1, Sandipan > <Sandipan.Das@amd.com>; kai.huang@intel.com; Yuan, Perry > <Perry.Yuan@amd.com>; x86@kernel.org; Huang, Ray > <Ray.Huang@amd.com>; rafael@kernel.org; Du, Xiaojian > <Xiaojian.Du@amd.com>; Limonciello, Mario <Mario.Limonciello@amd.com> > Subject: [PATCH 2/2] cpufreq: amd-pstate: change cpu freq transition delay > for some models > > Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core clock > more quickly and presicely according to CPU work loading. > This is advertised by the Fast CPPC x86 feature. > This change will only be effective in the *passive mode* of AMD pstate > driver. From the test results of different transition delay values, 600us is > chosen to make a balance between performance and power consumption. > > Some test results on AMD Ryzen 7840HS(Phoenix) APU: > > 1. Tbench > (Energy less is better, Throughput more is better, PPW--Performance per > Watt more is better) ============= =================== > ============== =============== ============== > =============== ============== =============== > =============== > Trans Delay Tbench governor:schedutil, 3-iterations average > ============= =================== ============== > =============== ============== =============== > ============== =============== =============== > 1000us Clients 1 2 4 8 12 16 > 32 > Energy/Joules 2010 2804 8768 17171 16170 > 15132 15027 > Throughput/(MB/s) 114 259 1041 3010 3135 > 4851 4605 > PPW 0.0567 0.0923 0.1187 0.1752 0.1938 > 0.3205 0.3064 > 600us Clients 1 2 4 8 12 16 32 > Energy/Joules 2115 (5.22%) 2388 (-14.84%) 10700(22.03%) 16716 > (-2.65%) 15939 (-1.43%) 15053 (-0.52%) 15083 (0.37% ) > Throughput/(MB/s) 122 (7.02%) 234 (-9.65% ) 1188 (14.12%) 3003 > (-0.23%) 3143 (0.26% ) 4842 (-0.19%) 4603 (-0.04%) > PPW 0.0576(1.59%) 0.0979(6.07% ) 0.111(-6.49%) > 0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% ) 0.3051(-0.42%) > ============= =================== ============== > ================ ============= =============== > ============== =============== =============== > > 2.Dbench > (Energy less is better, Throughput more is better, PPW--Performance per > Watt more is better) ============= =================== > ============== =============== ============== > =============== ============== =============== > =============== > Trans Delay Dbench governor:schedutil, 3-iterations average > ============= =================== ============== > =============== ============== =============== > ============== =============== =============== > 1000us Clients 1 2 4 8 12 16 > 32 > Energy/Joules 4890 3779 3567 5157 5611 > 6500 8163 > Throughput/(MB/s) 327 167 220 577 775 > 938 1397 > PPW 0.0668 0.0441 0.0616 0.1118 0.1381 > 0.1443 0.1711 > 600us Clients 1 2 4 8 12 16 32 > Energy/Joules 4915 (0.51%) 4912 (29.98%) 3506 (-1.71%) 4907 (- > 4.85% ) 5011 (-10.69%) 5672 (-12.74%) 8141 (-0.27%) > Throughput/(MB/s) 348 (6.42%) 284 (70.06%) 220 (0.00% ) 518 (- > 10.23%) 712 (-8.13% ) 854 (-8.96% ) 1475 (5.58% ) > PPW 0.0708(5.99%) 0.0578(31.07%) 0.0627(1.79% ) 0.1055(- > 5.64% ) 0.142(2.82% ) 0.1505(4.30% ) 0.1811(5.84% ) > ============= =================== ============== > =============== ============== =============== > ============== =============== =============== > > 3.Hackbench(less time is better) > ============= =========================== > ========================== > hackbench governor:schedutil > ============= =========================== > ========================== > Trans Delay Process Mode Ave time(s) Thread Mode Ave time(s) > 1000us 14.484 14.484 > 600us 14.418(-0.46%) 15.41(+6.39%) > ============= =========================== > ========================== > > 4.Perf_sched_bench(less time is better) > ============= =================== ============== > ============== ============== =============== > =============== ============= > Trans Delay perf_sched_bench governor:schedutil > ============= =================== ============== > ============== ============== =============== > =============== ============= > 1000us Groups 1 2 4 8 12 24 > AveTime(s) 1.64 2.851 5.878 11.636 16.093 > 26.395 > 600us Groups 1 2 4 8 12 24 > AveTime(s) 1.69(3.05%) 2.845(-0.21%) 5.843(-0.60%) 11.576(- > 0.52%) 16.092(-0.01%) 26.32(-0.28%) > ============= ================== ============== > ============== ============== =============== > =============== ============== > > 5.Sysbench(higher is better) > ============= ================== ============== > ================= ============== ================ > =============== ================= > Sysbench governor:schedutil > ============= ================== ============== > ================= ============== ================ > =============== ================= > 1000us Thread 1 2 4 8 12 24 > Ave events 6020.98 12273.39 24119.82 46171.57 > 47074.37 47831.72 > 600us Thread 1 2 4 8 12 24 > Ave events 6154.82(2.22%) 12271.63(-0.01%) 24392.5(1.13%) > 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%) > ============= ================== ============== > ================= ============== ================ > =============== ================= > > In conclusion, a shorter transition delay of cpu clock will make a quite positive > effect to improve PPW on Dbench test, in the meanwhile , keep stable > performance on Tbench, Hackbench, Perf_sched_bench and Sysbench. > > Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com> > Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/cpufreq/amd-pstate.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c > index 2015c9fcc3c9..8c8594f67af6 100644 > --- a/drivers/cpufreq/amd-pstate.c > +++ b/drivers/cpufreq/amd-pstate.c > @@ -50,6 +50,7 @@ > > #define AMD_PSTATE_TRANSITION_LATENCY 20000 > #define AMD_PSTATE_TRANSITION_DELAY 1000 > +#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY 600 > #define AMD_PSTATE_PREFCORE_THRESHOLD 166 > > /* > @@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy > *policy) > } > > policy->cpuinfo.transition_latency = > AMD_PSTATE_TRANSITION_LATENCY; > - policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY; > + > + if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC)) > + policy->transition_delay_us = > AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY; > + else > + policy->transition_delay_us = > AMD_PSTATE_TRANSITION_DELAY; > > policy->min = min_freq; > policy->max = max_freq; > -- > 2.34.1 LGTM Reviewed-by: Perry Yuan <perry.yuan@amd.com>
On 4/29/2024 02:03, Xiaojian Du wrote: > Some of AMD ZEN4 APU/CPU have support for adjusting the CPU core > clock more quickly and presicely according to CPU work loading. > This is advertised by the Fast CPPC x86 feature. > This change will only be effective in the *passive mode* of > AMD pstate driver. From the test results of different > transition delay values, 600us is chosen to make a balance > between performance and power consumption. > > Some test results on AMD Ryzen 7840HS(Phoenix) APU: > > 1. Tbench > (Energy less is better, Throughput more is better, > PPW--Performance per Watt more is better) > ============= =================== ============== =============== ============== =============== ============== =============== =============== > Trans Delay Tbench governor:schedutil, 3-iterations average > ============= =================== ============== =============== ============== =============== ============== =============== =============== > 1000us Clients 1 2 4 8 12 16 32 > Energy/Joules 2010 2804 8768 17171 16170 15132 15027 > Throughput/(MB/s) 114 259 1041 3010 3135 4851 4605 > PPW 0.0567 0.0923 0.1187 0.1752 0.1938 0.3205 0.3064 > 600us Clients 1 2 4 8 12 16 32 > Energy/Joules 2115 (5.22%) 2388 (-14.84%) 10700(22.03%) 16716 (-2.65%) 15939 (-1.43%) 15053 (-0.52%) 15083 (0.37% ) > Throughput/(MB/s) 122 (7.02%) 234 (-9.65% ) 1188 (14.12%) 3003 (-0.23%) 3143 (0.26% ) 4842 (-0.19%) 4603 (-0.04%) > PPW 0.0576(1.59%) 0.0979(6.07% ) 0.111(-6.49%) 0.1796(2.51% ) 0.1971(1.70% ) 0.3216(0.34% ) 0.3051(-0.42%) > ============= =================== ============== ================ ============= =============== ============== =============== =============== > > 2.Dbench > (Energy less is better, Throughput more is better, > PPW--Performance per Watt more is better) > ============= =================== ============== =============== ============== =============== ============== =============== =============== > Trans Delay Dbench governor:schedutil, 3-iterations average > ============= =================== ============== =============== ============== =============== ============== =============== =============== > 1000us Clients 1 2 4 8 12 16 32 > Energy/Joules 4890 3779 3567 5157 5611 6500 8163 > Throughput/(MB/s) 327 167 220 577 775 938 1397 > PPW 0.0668 0.0441 0.0616 0.1118 0.1381 0.1443 0.1711 > 600us Clients 1 2 4 8 12 16 32 > Energy/Joules 4915 (0.51%) 4912 (29.98%) 3506 (-1.71%) 4907 (-4.85% ) 5011 (-10.69%) 5672 (-12.74%) 8141 (-0.27%) > Throughput/(MB/s) 348 (6.42%) 284 (70.06%) 220 (0.00% ) 518 (-10.23%) 712 (-8.13% ) 854 (-8.96% ) 1475 (5.58% ) > PPW 0.0708(5.99%) 0.0578(31.07%) 0.0627(1.79% ) 0.1055(-5.64% ) 0.142(2.82% ) 0.1505(4.30% ) 0.1811(5.84% ) > ============= =================== ============== =============== ============== =============== ============== =============== =============== > > 3.Hackbench(less time is better) > ============= =========================== ========================== > hackbench governor:schedutil > ============= =========================== ========================== > Trans Delay Process Mode Ave time(s) Thread Mode Ave time(s) > 1000us 14.484 14.484 > 600us 14.418(-0.46%) 15.41(+6.39%) > ============= =========================== ========================== > > 4.Perf_sched_bench(less time is better) > ============= =================== ============== ============== ============== =============== =============== ============= > Trans Delay perf_sched_bench governor:schedutil > ============= =================== ============== ============== ============== =============== =============== ============= > 1000us Groups 1 2 4 8 12 24 > AveTime(s) 1.64 2.851 5.878 11.636 16.093 26.395 > 600us Groups 1 2 4 8 12 24 > AveTime(s) 1.69(3.05%) 2.845(-0.21%) 5.843(-0.60%) 11.576(-0.52%) 16.092(-0.01%) 26.32(-0.28%) > ============= ================== ============== ============== ============== =============== =============== ============== > > 5.Sysbench(higher is better) > ============= ================== ============== ================= ============== ================ =============== ================= > Sysbench governor:schedutil > ============= ================== ============== ================= ============== ================ =============== ================= > 1000us Thread 1 2 4 8 12 24 > Ave events 6020.98 12273.39 24119.82 46171.57 47074.37 47831.72 > 600us Thread 1 2 4 8 12 24 > Ave events 6154.82(2.22%) 12271.63(-0.01%) 24392.5(1.13%) 46117.64(-0.12%) 46852.19(-0.47%) 47678.92(-0.32%) > ============= ================== ============== ================= ============== ================ =============== ================= > > In conclusion, a shorter transition delay > of cpu clock will make a quite positive effect to improve PPW on Dbench test, > in the meanwhile , keep stable performance on Tbench, > Hackbench, Perf_sched_bench and Sysbench. > > Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com> > Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Rafael, You can swap my R-b for an A-b for when you pick this up after merge window. Thx! Acked-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/cpufreq/amd-pstate.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c > index 2015c9fcc3c9..8c8594f67af6 100644 > --- a/drivers/cpufreq/amd-pstate.c > +++ b/drivers/cpufreq/amd-pstate.c > @@ -50,6 +50,7 @@ > > #define AMD_PSTATE_TRANSITION_LATENCY 20000 > #define AMD_PSTATE_TRANSITION_DELAY 1000 > +#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY 600 > #define AMD_PSTATE_PREFCORE_THRESHOLD 166 > > /* > @@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) > } > > policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY; > - policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY; > + > + if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC)) > + policy->transition_delay_us = AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY; > + else > + policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY; > > policy->min = min_freq; > policy->max = max_freq;
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 2015c9fcc3c9..8c8594f67af6 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -50,6 +50,7 @@ #define AMD_PSTATE_TRANSITION_LATENCY 20000 #define AMD_PSTATE_TRANSITION_DELAY 1000 +#define AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY 600 #define AMD_PSTATE_PREFCORE_THRESHOLD 166 /* @@ -868,7 +869,11 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) } policy->cpuinfo.transition_latency = AMD_PSTATE_TRANSITION_LATENCY; - policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY; + + if (cpu_feature_enabled(X86_FEATURE_FAST_CPPC)) + policy->transition_delay_us = AMD_PSTATE_FAST_CPPC_TRANSITION_DELAY; + else + policy->transition_delay_us = AMD_PSTATE_TRANSITION_DELAY; policy->min = min_freq; policy->max = max_freq;