diff mbox

Ask for help on governor

Message ID 000701d37479$e0570320$a1050960$@net (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Doug Smythies Dec. 14, 2017, 1:21 a.m. UTC
Note: adding Stratos, the commit author.
 
On 2017.12.12 22:22 Andy Tang wrote:

> Anyway I found the root cause myself.
>
> It was caused by commit: 00bfe05889e91b5112893b001e4a47b0a0f8bdd7.

Agreed.

Then why did my kernel bisection come to a different conclusion?
Because, in my case, the issue only manifested itself when the
sampling rate (which should really be called sampling period), became
low enough to bring the issue to the surface.

The other important thing to note for my system is that I use (O.K.,
steal) the Ubuntu kernel configuration and so my tick rate is 250 Hz, not
1000 Hz.

I think the math for the idle periods calculation breaks down here
(from drivers/cpufreq/cpufreq_governor.c):

                if (time_elapsed > 2 * sampling_rate) {
                        unsigned int periods = time_elapsed / sampling_rate;

                        if (periods < idle_periods)
                                idle_periods = periods;
                }

if 2 * sampling_rate is less than one jiffy.
I.E. isn't time_elapsed always exactly one jiffy for a fully loaded CPU?
Important note: on my system a jiffy is just over 4 milliseconds.
So, for my test which is 100% load on one CPU, basically, idle_periods is always 2,
maybe more and the conservative code is always resetting the target CPU frequency
to minimum.

For whatever reason, on my system, a frequency step of 5% will not raise the pstate,
even though it should (the math works out to 1790 MHz, or pstate 17, but I never see it.
If I raise the frequency step to anything else, the math makes complete sense. Example:
frequency step = 10% so 3800 * 0.1 + 1600 = 1980 which means I should see pstate 19 being
asked for. I do. However, it does not continue to increase because of the
idle_periods problem, driving it back as an intermediate calculation, so all I ever see is
requested pstate of 19.

O.K. so if all this is true, then a 1000 Hz kernel shouldn't have a problem. Sort of, it
doesn't. Why "sort of"? Because the default sample period of 500 usec is right
on the edge, and sometimes the requested pstate does drop as a result. I used this
command to watch the requested pstate:

watch -d -n 1 sudo rdmsr --bitfield 15:8 -d -a 0x199

(translation: watch the actual requested pstates for all CPUs, by reading the processor itself.)

and while for CPU 7, it should clamp at 38 (for my system) it doesn't. O.K. so just
increase the sampling period a little, to say 510 uSec, and then yes, it clamps at 38.

O.K. so finally, I reverted the commit (but in a cheating way):

 
And everything is fine.

... Doug

Comments

Andy Tang Dec. 14, 2017, 2:42 a.m. UTC | #1
> -----Original Message-----
> From: Doug Smythies [mailto:dsmythies@telus.net]
> Sent: Thursday, December 14, 2017 9:21 AM
> To: Andy Tang <andy.tang@nxp.com>; 'Viresh Kumar'
> <viresh.kumar@linaro.org>; 'Stratos Karafotis' <stratosk@semaphore.gr>
> Cc: 'Rafael J. Wysocki' <rafael@kernel.org>; 'Rafael J. Wysocki'
> <rjw@rjwysocki.net>; 'Linux PM' <linux-pm@vger.kernel.org>; Doug
> Smythies <dsmythies@telus.net>
> Subject: RE: Ask for help on governor
> 
> diff --git a/drivers/cpufreq/cpufreq_governor.c
> b/drivers/cpufreq/cpufreq_governor.c
> index 58d4f4e..3493ca7 100644
> --- a/drivers/cpufreq/cpufreq_governor.c
> +++ b/drivers/cpufreq/cpufreq_governor.c
> @@ -222,6 +222,7 @@ unsigned int dbs_update(struct cpufreq_policy
> *policy)
>                         max_load = load;
>         }
> 
> +       idle_periods = 0;

With this line added, conservative governor works well.

Ondemand governor still can't work well.
As I mentioned in earlier email, load varies from different sampling_rate.

Regards,
Andy

>         policy_dbs->idle_periods = idle_periods;
> 
>         return max_load;
> 
> And everything is fine.
> 
> ... Doug
>
Stratos Karafotis Dec. 14, 2017, 6:25 p.m. UTC | #2
Hi

On 14/12/2017 04:42 πμ, Andy Tang wrote:
> 
> 
>> -----Original Message-----
>> From: Doug Smythies [mailto:dsmythies@telus.net]
>> Sent: Thursday, December 14, 2017 9:21 AM
>> To: Andy Tang <andy.tang@nxp.com>; 'Viresh Kumar'
>> <viresh.kumar@linaro.org>; 'Stratos Karafotis' <stratosk@semaphore.gr>
>> Cc: 'Rafael J. Wysocki' <rafael@kernel.org>; 'Rafael J. Wysocki'
>> <rjw@rjwysocki.net>; 'Linux PM' <linux-pm@vger.kernel.org>; Doug
>> Smythies <dsmythies@telus.net>
>> Subject: RE: Ask for help on governor
>>
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/drivers/cpufreq/cpufreq_governor.c
>> index 58d4f4e..3493ca7 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -222,6 +222,7 @@ unsigned int dbs_update(struct cpufreq_policy
>> *policy)
>>                         max_load = load;
>>         }
>>
>> +       idle_periods = 0;
> 
> With this line added, conservative governor works well.
> 
> Ondemand governor still can't work well.
> As I mentioned in earlier email, load varies from different sampling_rate.
> 
> Regards,
> Andy
> 

I'm not sure I understood what the problem is (maybe because I can't see
the description of the issue in the quotes).

Commit 00bfe05889e91b5112893b001e4a47b0a0f8bdd7 affects only conservative
governor.
Do you have the same issue with ondemand too?

Stratos
Doug Smythies Dec. 15, 2017, 1:29 a.m. UTC | #3
Hi,

On 2017.12.14 Stratos Karafotis wrote:

> I'm not sure I understood what the problem is (maybe because I can't see
> the description of the issue in the quotes).

For my part of it, the problem is that the default sample_period for
the intel_cpufreq scaling driver (i.e. intel_pstate in passive mode)
is 500 uSec. Everything, as far as I can determine, becomes busted if
the sampling period is less than a jiffy.

By the way, the default sampling period for the acpi_cpufreq driver seems to be
10 milliseconds, and everything is fine. I do not know why the defaults for
the intel_cpufreq driver are so short, and suggest that they shouldn't be,
in which case your commit might be O.K.

>
> Commit 00bfe05889e91b5112893b001e4a47b0a0f8bdd7 affects only conservative
> governor.
> Do you have the same issue with ondemand too?

Yes, this entire thread has been about the conservative governor.

That was the first time Andy mentioned issues with on demand, but
I do see it misbehave with the intel_cpufreq default sample period
(a.k.a sampling_rate) of 500uSec. See another e-mail I send in a moment.

... Doug
Doug Smythies Dec. 15, 2017, 1:30 a.m. UTC | #4
On 2017.12.13 18:42 Andy Tang wrote:
> On 2017.12.14 09:21 Doug wrote: 
>> 
>> diff --git a/drivers/cpufreq/cpufreq_governor.c
>> b/drivers/cpufreq/cpufreq_governor.c
>> index 58d4f4e..3493ca7 100644
>> --- a/drivers/cpufreq/cpufreq_governor.c
>> +++ b/drivers/cpufreq/cpufreq_governor.c
>> @@ -222,6 +222,7 @@ unsigned int dbs_update(struct cpufreq_policy
>> *policy)
>>                         max_load = load;
>>         }
>> 
>> +       idle_periods = 0;
>
> With this line added, conservative governor works well.
>
> Ondemand governor still can't work well.

Your original message said:

On 12-12-17, 02:46, Andy Tang wrote:
> I run into a question that conservative governor can't work while ondemand governor works well.

So, for my part of it, I never looked at ondemand.

> As I mentioned in earlier email, load varies from different sampling_rate.

Because that e-mail was html, a lot of people didn't get it, and it also is not
in the archives. Anyway:

On 2017.12.12 23:24 Andy wrote:

> I also found sampling_rate affects the CPU load dramatically.
>			sample rate
>			100000	10000	1000
> actual cpu load	1%		1%	1%
> cpu report load	1%		20%	40-100%
>
> So we can know that when sample_rate is higher than 10000us, cpufreq can not calculate the load correctly.

Can you tell us more details about how you did this test and how you know your data is correct?
In particular, what is your work/sleep frequency for your 1% load.
In my experience the work/sleep frequency has always been a very important consideration with these tests.
What is the tick rate of your kernel? Myself, I wouldn't expect a 1000 uSec sample period to work properly.

I used turbostat to verify my tests, and yes, on demand does start to misbehave if the sampling period
is short relative to the tick period (jiffy time), otherwise it was fine.

... Doug
Andy Tang Dec. 15, 2017, 1:56 a.m. UTC | #5
Hi,

> -----Original Message-----
> From: Doug Smythies [mailto:dsmythies@telus.net]
> Sent: Friday, December 15, 2017 9:31 AM
> To: Andy Tang <andy.tang@nxp.com>; 'Viresh Kumar'
> <viresh.kumar@linaro.org>; 'Stratos Karafotis' <stratosk@semaphore.gr>
> Cc: 'Rafael J. Wysocki' <rafael@kernel.org>; 'Rafael J. Wysocki'
> <rjw@rjwysocki.net>; 'Linux PM' <linux-pm@vger.kernel.org>
> Subject: RE: Ask for help on governor
> 
> On 2017.12.13 18:42 Andy Tang wrote:
> > On 2017.12.14 09:21 Doug wrote:
> >>
> >> diff --git a/drivers/cpufreq/cpufreq_governor.c
> >> b/drivers/cpufreq/cpufreq_governor.c
> >> index 58d4f4e..3493ca7 100644
> >> --- a/drivers/cpufreq/cpufreq_governor.c
> >> +++ b/drivers/cpufreq/cpufreq_governor.c
> >> @@ -222,6 +222,7 @@ unsigned int dbs_update(struct cpufreq_policy
> >> *policy)
> >>                         max_load = load;
> >>         }
> >>
> >> +       idle_periods = 0;
> >
> > With this line added, conservative governor works well.
> >
> > Ondemand governor still can't work well.
> 
> Your original message said:
> 
> On 12-12-17, 02:46, Andy Tang wrote:
> > I run into a question that conservative governor can't work while
> ondemand governor works well.
> 
> So, for my part of it, I never looked at ondemand.
Sorry for the misleading. Actually I observed two issues:
1. conservative governor can't increase cpu frequency even the load is high than 90%
2. CPU load can't be calculated correctly when sampling_rate is 1000 for both conservative and ondemand governor. At first I thought ondemand governor worked well. Actually I did not test it thoroughly. Ondemand governor can work only when cpu load is calculated correct.

> 
> > As I mentioned in earlier email, load varies from different sampling_rate.
> 
> Because that e-mail was html, a lot of people didn't get it, and it also is not in
> the archives. Anyway:
> 
> On 2017.12.12 23:24 Andy wrote:
> 
> > I also found sampling_rate affects the CPU load dramatically.
> >			sample rate
> >			100000	10000	1000
> > actual cpu load	1%		1%	1%
> > cpu report load	1%		20%	40-100%
> >
> > So we can know that when sample_rate is higher than 10000us, cpufreq
> can not calculate the load correctly.
> 
> Can you tell us more details about how you did this test and how you know
> your data is correct?
> In particular, what is your work/sleep frequency for your 1% load.
> In my experience the work/sleep frequency has always been a very
> important consideration with these tests.
> What is the tick rate of your kernel? Myself, I wouldn't expect a 1000 uSec
> sample period to work properly.
My tick rate is 250Hz.  And Yes, cpu load can't be calculated correctly.
We really should set sampling_rate a proper values as default if there are some sample rate don't work properly. We don't want to calculate a workable sampling_rate according to the tick rate manually.
 
That's odd too that you wouldn't expect a 1000 Usec sample period to work properly but set this value as default, even worse, you allow to set a lower value to it.

Is it better if we set a low limit of  sample period? 

Regards,
Andy
 
> 
> I used turbostat to verify my tests, and yes, on demand does start to
> misbehave if the sampling period is short relative to the tick period (jiffy
> time), otherwise it was fine.
> 
> ... Doug
>
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 58d4f4e..3493ca7 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -222,6 +222,7 @@  unsigned int dbs_update(struct cpufreq_policy *policy)
                        max_load = load;
        }

+       idle_periods = 0;
        policy_dbs->idle_periods = idle_periods;

        return max_load;