diff mbox

[linux-next] cpufreq: conservative: Fix sampling_down_factor functionality

Message ID 51351CC3.4010301@semaphore.gr (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Stratos Karafotis March 4, 2013, 10:14 p.m. UTC
sampling_down_factor tunable is unused since commit
8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 (4 years ago).

This patch restores the original functionality.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
---
 drivers/cpufreq/cpufreq_conservative.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Viresh Kumar March 5, 2013, 12:23 a.m. UTC | #1
On Tue, Mar 5, 2013 at 6:14 AM, Stratos Karafotis <stratosk@semaphore.gr> wrote:
> sampling_down_factor tunable is unused since commit
> 8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 (4 years ago).
>
> This patch restores the original functionality.
>
> Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
> ---
>  drivers/cpufreq/cpufreq_conservative.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
> index 4fd0006..4b27c21 100644
> --- a/drivers/cpufreq/cpufreq_conservative.c
> +++ b/drivers/cpufreq/cpufreq_conservative.c
> @@ -87,6 +87,12 @@ static void cs_check_cpu(int cpu, unsigned int load)
>                 return;
>         }
>
> +       /* if sampling_down_factor is active break out early */
> +       if (++dbs_info->down_skip < cs_tuners.sampling_down_factor)
> +               return;
> +
> +       dbs_info->down_skip = 0;
> +

Interesting. Because it was removed earlier and no body complained :)

I got following from Documentation:

sampling_down_factor: this parameter controls the rate at which the
kernel makes a decision on when to decrease the frequency while running
at top speed. When set to 1 (the default) decisions to reevaluate load
are made at the same interval regardless of current clock speed. But
when set to greater than 1 (e.g. 100) it acts as a multiplier for the
scheduling interval for reevaluating load when the CPU is at its top
speed due to high load. This improves performance by reducing the overhead
of load evaluation and helping the CPU stay at its top speed when truly
busy, rather than shifting back and forth in speed. This tunable has no
effect on behavior at lower speeds/lower CPU loads.

And i believe we are supposed to check if we are at the top speed or not.
Over that i believe the code should be like:

While setting speed to top speed, set the timer to delay * sampling_down_factor,
so that we actually don't reevaluate the load. What do you say?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stratos Karafotis March 5, 2013, 5:22 a.m. UTC | #2
Hi Viresh,

On 03/05/2013 02:23 AM, Viresh Kumar wrote:> Interesting. Because it was removed earlier and no body complained :)
> 
> I got following from Documentation:
> 
> sampling_down_factor: this parameter controls the rate at which the
> kernel makes a decision on when to decrease the frequency while running
> at top speed. When set to 1 (the default) decisions to reevaluate load
> are made at the same interval regardless of current clock speed. But
> when set to greater than 1 (e.g. 100) it acts as a multiplier for the
> scheduling interval for reevaluating load when the CPU is at its top
> speed due to high load. This improves performance by reducing the overhead
> of load evaluation and helping the CPU stay at its top speed when truly
> busy, rather than shifting back and forth in speed. This tunable has no
> effect on behavior at lower speeds/lower CPU loads.
> 
> And i believe we are supposed to check if we are at the top speed or not.
> Over that i believe the code should be like:
> 
> While setting speed to top speed, set the timer to delay * sampling_down_factor,
> so that we actually don't reevaluate the load. What do you say?
> 

I had the same thoughts, but I saw the comments in the code:

/*
 * Every sampling_rate, we check, if current idle time is less than 20%
 * (default), then we try to increase frequency Every sampling_rate *
 * sampling_down_factor, we check, if current idle time is more than 80%, then
 * we try to decrease frequency
 *

Also checking the code before the commit 8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 you may see that sampling down factor works in this way.
So, I decided to keep the original functionality (also down_skip was already there unused).

Regards,
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 5, 2013, 7:34 a.m. UTC | #3
On 5 March 2013 13:22, Stratos Karafotis <stratosk@semaphore.gr> wrote:
> I had the same thoughts, but I saw the comments in the code:
>
> /*
>  * Every sampling_rate, we check, if current idle time is less than 20%
>  * (default), then we try to increase frequency Every sampling_rate *
>  * sampling_down_factor, we check, if current idle time is more than 80%, then
>  * we try to decrease frequency

I misread it here when i looked at this mail for the first time. :)
I strongly believe that we need a full stop (.) before "Every sampling_rate",
otherwise it looks like we check for down_factor while increasing freq :)

>  *
>
> Also checking the code before the commit 8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 you may see that sampling down factor works in this way.
> So, I decided to keep the original functionality (also down_skip was already there unused).

I got that comment but i belive the code was never according to that comment
and not even now. Check the initial patch for conservative governor:
b9170836d1aa4ded7cc1ac1cb8fbc7867061c98c

Even now we aren't checking this 80% thing, right? And so in your patch we can
actually fix the patch too with the right logic of code.. And
documentation too :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David C Niemi March 5, 2013, 2:11 p.m. UTC | #4
I am the author of sampling_down_factor.  I only intended it to have an effect if you are already attempting to achieve maximum performance (e.g. at the top available clock speed for the current infrastructure).  This is special because there is no possibility of going any faster so the only speed change you can think about is slowing down, which is typically much less urgent than speeding up.  So I did not see sampling_down_factor as being relevant at intermediate frequencies, though perhaps there could be something interesting but opposite to do at the minimum-power state too.

In the applications this was originally intended for you were likely to either be idle and trying to save power (but be ready to ramp up quickly), or at max speed, and rarely in between.  sampling_down_factor made a dramatic improvement in performance, to the point there was no longer any reason to use the "performance" governor.

I suspect the rationale for sampling_down_factor would apply to non-frequency-based governor/driver combos as well, like the p-state driver, though in that case you are not necessarily changing a literal clock speed and you also have additional opportunities like turbo/boost states to consider.  But in some applications, you may not want to use boost states that cannot be sustained when all cores are active, so there is more user/operator intent that must be communicated to the governor.

DCN

On 03/05/13 00:22, Stratos Karafotis wrote:
> Hi Viresh,
>
> On 03/05/2013 02:23 AM, Viresh Kumar wrote:> Interesting. Because it was removed earlier and no body complained :)
>> I got following from Documentation:
>>
>> sampling_down_factor: this parameter controls the rate at which the
>> kernel makes a decision on when to decrease the frequency while running
>> at top speed. When set to 1 (the default) decisions to reevaluate load
>> are made at the same interval regardless of current clock speed. But
>> when set to greater than 1 (e.g. 100) it acts as a multiplier for the
>> scheduling interval for reevaluating load when the CPU is at its top
>> speed due to high load. This improves performance by reducing the overhead
>> of load evaluation and helping the CPU stay at its top speed when truly
>> busy, rather than shifting back and forth in speed. This tunable has no
>> effect on behavior at lower speeds/lower CPU loads.
>>
>> And i believe we are supposed to check if we are at the top speed or not.
>> Over that i believe the code should be like:
>>
>> While setting speed to top speed, set the timer to delay * sampling_down_factor,
>> so that we actually don't reevaluate the load. What do you say?
>>
> I had the same thoughts, but I saw the comments in the code:
>
> /*
>  * Every sampling_rate, we check, if current idle time is less than 20%
>  * (default), then we try to increase frequency Every sampling_rate *
>  * sampling_down_factor, we check, if current idle time is more than 80%, then
>  * we try to decrease frequency
>  *
>
> Also checking the code before the commit 8e677ce83bf41ba9c74e5b6d9ee60b07d4e5ed93 you may see that sampling down factor works in this way.
> So, I decided to keep the original functionality (also down_skip was already there unused).
>
> Regards,
> Stratos
>
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David C Niemi March 5, 2013, 2:21 p.m. UTC | #5
I should clarify -- I wrote the sampling_down_factor in the *ondemand* governor.  I chose the name of the parameter based on the vaguely similar parameter in the conservative governor, but the documentation that was referenced (about it only applying at top speed and the comment about skipping evaluation opportunities when it is active) was written by me in reference to the ondemand governor.  It could be that someone backported some of the ondemand sampling_down_factor's behavior to the conservative governor.

I'd like to ask -- what is the intended use of the conservative governor these days as differentiated from the ondemand governor?  At one time it seemed more oriented towards power savings, but the ondemand governor had picked up most or all of its power-saving features.

DCN

On 03/05/13 09:11, David C Niemi wrote:
> I am the author of sampling_down_factor.  I only intended it to have an effect if you are already attempting to achieve maximum performance (e.g. at the top available clock speed for the current infrastructure).  This is special because there is no possibility of going any faster so the only speed change you can think about is slowing down, which is typically much less urgent than speeding up.  So I did not see sampling_down_factor as being relevant at intermediate frequencies, though perhaps there could be something interesting but opposite to do at the minimum-power state too.
>
> In the applications this was originally intended for you were likely to either be idle and trying to save power (but be ready to ramp up quickly), or at max speed, and rarely in between.  sampling_down_factor made a dramatic improvement in performance, to the point there was no longer any reason to use the "performance" governor.
>
> I suspect the rationale for sampling_down_factor would apply to non-frequency-based governor/driver combos as well, like the p-state driver, though in that case you are not necessarily changing a literal clock speed and you also have additional opportunities like turbo/boost states to consider.  But in some applications, you may not want to use boost states that cannot be sustained when all cores are active, so there is more user/operator intent that must be communicated to the governor.
>
> DCN

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stratos Karafotis March 5, 2013, 8:15 p.m. UTC | #6
On 03/05/2013 09:34 AM, Viresh Kumar wrote:
> On 5 March 2013 13:22, Stratos Karafotis <stratosk@semaphore.gr> wrote:
> I misread it here when i looked at this mail for the first time. :)
> I strongly believe that we need a full stop (.) before "Every sampling_rate",
> otherwise it looks like we check for down_factor while increasing freq :)

I agree. I will do that.

> Even now we aren't checking this 80% thing, right? And so in your patch we can
> actually fix the patch too with the right logic of code.. And
> documentation too :)

In my opinion the logic was initially correct. It was broken in the same 
commit that broke also sampling_down_factor.

Now we check if load < (cs_tuners.down_threshold - 10) to decrease freq.
Down threshold is 20, so we actually check the 80% idle.

I think the subtraction of 10 from down_threshold is wrong. It seems 
similar with ondemand but there is no logic for this in conservative.
User can simply select the down_threshold and the load will be compared 
with user's value. No need to alter user's selection.

I will prepare a patchset for these changes.

Regards,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stratos Karafotis March 5, 2013, 8:37 p.m. UTC | #7
Hi David,
On 03/05/2013 04:21 PM, David C Niemi wrote:
>
> I should clarify -- I wrote the sampling_down_factor in the *ondemand* governor.  I chose the name of the parameter based on the vaguely similar parameter in the conservative governor, but the documentation that was referenced (about it only applying at top speed and the comment about skipping evaluation opportunities when it is active) was written by me in reference to the ondemand governor.  It could be that someone backported some of the ondemand sampling_down_factor's behavior to the conservative governor.
>
> I'd like to ask -- what is the intended use of the conservative governor these days as differentiated from the ondemand governor?  At one time it seemed more oriented towards power savings, but the ondemand governor had picked up most or all of its power-saving features.

Thanks for the information.
I would agree about the use of conservative, but I think that I'm not
the right person to answer this question. :)

Regards,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 6, 2013, 6:43 a.m. UTC | #8
On 5 March 2013 22:21, David C Niemi <dniemi@verisign.com> wrote:
> I'd like to ask -- what is the intended use of the conservative governor these days as differentiated from the ondemand governor?  At one time it seemed more oriented towards power savings, but the ondemand governor had picked up most or all of its power-saving features.

I believe the difference is still the same.. conservative is not as
aggressive as
ondemand now also, i don't know which people are using it though :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
index 4fd0006..4b27c21 100644
--- a/drivers/cpufreq/cpufreq_conservative.c
+++ b/drivers/cpufreq/cpufreq_conservative.c
@@ -87,6 +87,12 @@  static void cs_check_cpu(int cpu, unsigned int load)
                return;
        }

+       /* if sampling_down_factor is active break out early */
+       if (++dbs_info->down_skip < cs_tuners.sampling_down_factor)
+               return;
+
+       dbs_info->down_skip = 0;
+
        /*
         * The optimal frequency is the frequency that is the lowest that can
         * support the current CPU usage without triggering the up policy. To be