diff mbox

cpufreq: Add scaling frequency range support

Message ID 55B8A3F2.8030809@intel.com (mailing list archive)
State Rejected, archived
Headers show

Commit Message

xinhui.pan July 29, 2015, 9:59 a.m. UTC
hi, Rafael
	thanks for you reply.

On 2015?07?29? 08:18, Rafael J. Wysocki wrote:
> On Tuesday, July 28, 2015 12:53:33 PM Pan Xinhui wrote:
>> hi, Viresh
>> 	thanks for your reply :)
>> On 2015?07?28? 12:29, Viresh Kumar wrote:
>>> On 28-07-15, 11:32, Pan Xinhui wrote:
>>>> From: Pan Xinhui <xinhuix.pan@intel.com>
>>>>
>>>> Userspace at most time do cpufreq tests very much inconveniently.
>>>> Currently they have to echo min and max cpu freq separately like below:
>>>> echo 480000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
>>>> echo 2240000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
>>>>
>>>> Add scaling_freq_range cpufreq attr to support userspace's demand.
>>>> Therefore it's easier for testers to write readable scripts like below: 
>>>> echo 480000-2240000 >
>>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_freq_range
>>>
>>> I don't think this brings any good change, we already have support for
>>> that with min/max freqs and I don't see how scripts can be less
>>> readable with that.
>>>
>> yes, min/max are supported, however it is inconvenient. sometime it's very easy to cause obscure bugs.
>> For example, some one might write a script like below.
>> echo 480000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
>> echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
>> .....//other works
>> echo 1120000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
>> echo 2240000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
>> ...//other works
>>
>> But it did not work when we echo 112000 to min-freq, as the current max freq is smaller than it.
>> It's hard to figure it out in a big script... we have many such scripts.
> 
> Fix them, then, pretty please.
> 
of course we will fix them. :)

> And adding this attribute is not going to magically fix them, is it?
> 
yes, this patch can not fix them without changing the script. BUT I have another patch which could magically fix them. :)

These two attribute files are very tricky. they are related with each other.
Not like some other attribute file in other part of kernel, for example, proc/sys/fs/file-max.
As the file-min is always zero. It's very reasonable to only support file-max attribute file.

The sequence we echoing value to min/max_freq is very important. Maybe we can also assume they have *state*.
Just like a developer writes a buf to a file. he should do in this way below.
fp = fopen(..)
 => fwrite(...)
  => fclose(...)

The script I mentioned above did not follow the right sequence. when script wants to set the min higher, we need set the max first to avoid min > max issue...
So max/min_freq have *state*. just like TCP Three-way handshake, SYN, ACK&SYN, ACK. the sequence(this is so-called state) is very important.

Now I want to offer a non-state attribute to user-space :)
This is a design/engineering problem. It's okay for kernel to not offer such attribute. But user-space will do more work.
For example, In the worst case, we need system call four times.
read min/max_freq (system call two times)
might set min or max freq first to avoid min > max issue (system call one time)
set min/max a new value (system call one time)

What if we offer *set freq range* attribute? just once. :)
set freq range (system call one time)

From performance point, It's a good idea to offer such attribute.

There is another reason for why it's good to apply this patch.
If cpufreg range is 480000-960000, we call it powersave, 480000-2240000 is normal, 1920000-2240000 is performance.
Assume current cpufreq range is powersave, then user want to set it to performance because user wants to play a 3D game.
BUT user have to set it to normal first, then set it to performance because min(performance) > max(powersave).....
I don't know how people(end-user) would think about such behavior.... why we must be back to normal first, then performance?

As for the patch I mentioned above which could magically fix them.
The solution is: change store_scaling_max_freq and store_scaling_min_freq sysfs callback, let them have *state*.
Always keep the value from user-space.

patch like:




Thanks
xinhui

> Thanks,
> Rafael
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

xinhui.pan July 29, 2015, 10:04 a.m. UTC | #1
On 2015?07?29? 17:59, Pan Xinhui wrote:
> hi, Rafael
> 	thanks for you reply.
> 
> On 2015?07?29? 08:18, Rafael J. Wysocki wrote:
>> On Tuesday, July 28, 2015 12:53:33 PM Pan Xinhui wrote:
>>> hi, Viresh
>>> 	thanks for your reply :)
>>> On 2015?07?28? 12:29, Viresh Kumar wrote:
>>>> On 28-07-15, 11:32, Pan Xinhui wrote:
>>>>> From: Pan Xinhui <xinhuix.pan@intel.com>
>>>>>
>>>>> Userspace at most time do cpufreq tests very much inconveniently.
>>>>> Currently they have to echo min and max cpu freq separately like below:
>>>>> echo 480000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
>>>>> echo 2240000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
>>>>>
>>>>> Add scaling_freq_range cpufreq attr to support userspace's demand.
>>>>> Therefore it's easier for testers to write readable scripts like below: 
>>>>> echo 480000-2240000 >
>>>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_freq_range
>>>>
>>>> I don't think this brings any good change, we already have support for
>>>> that with min/max freqs and I don't see how scripts can be less
>>>> readable with that.
>>>>
>>> yes, min/max are supported, however it is inconvenient. sometime it's very easy to cause obscure bugs.
>>> For example, some one might write a script like below.
>>> echo 480000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
>>> echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
>>> .....//other works
>>> echo 1120000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
>>> echo 2240000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
>>> ...//other works
>>>
>>> But it did not work when we echo 112000 to min-freq, as the current max freq is smaller than it.
>>> It's hard to figure it out in a big script... we have many such scripts.
>>
>> Fix them, then, pretty please.
>>
> of course we will fix them. :)
> 
>> And adding this attribute is not going to magically fix them, is it?
>>
> yes, this patch can not fix them without changing the script. BUT I have another patch which could magically fix them. :)
> 
> These two attribute files are very tricky. they are related with each other.
> Not like some other attribute file in other part of kernel, for example, proc/sys/fs/file-max.
> As the file-min is always zero. It's very reasonable to only support file-max attribute file.
> 
> The sequence we echoing value to min/max_freq is very important. Maybe we can also assume they have *state*.
> Just like a developer writes a buf to a file. he should do in this way below.
> fp = fopen(..)
>  => fwrite(...)
>   => fclose(...)
> 
> The script I mentioned above did not follow the right sequence. when script wants to set the min higher, we need set the max first to avoid min > max issue...
> So max/min_freq have *state*. just like TCP Three-way handshake, SYN, ACK&SYN, ACK. the sequence(this is so-called state) is very important.
> 
> Now I want to offer a non-state attribute to user-space :)
> This is a design/engineering problem. It's okay for kernel to not offer such attribute. But user-space will do more work.
> For example, In the worst case, we need system call four times.
> read min/max_freq (system call two times)
> might set min or max freq first to avoid min > max issue (system call one time)
> set min/max a new value (system call one time)
> 
> What if we offer *set freq range* attribute? just once. :)
> set freq range (system call one time)
> 
> From performance point, It's a good idea to offer such attribute.
> 
> There is another reason for why it's good to apply this patch.
> If cpufreg range is 480000-960000, we call it powersave, 480000-2240000 is normal, 1920000-2240000 is performance.
> Assume current cpufreq range is powersave, then user want to set it to performance because user wants to play a 3D game.
> BUT user have to set it to normal first, then set it to performance because min(performance) > max(powersave).....
> I don't know how people(end-user) would think about such behavior.... why we must be back to normal first, then performance?
> 
> As for the patch I mentioned above which could magically fix them.
> The solution is: change store_scaling_max_freq and store_scaling_min_freq sysfs callback, let them have *state*.
> Always keep the value from user-space.
> 
> patch like:
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 8772346..00e6965 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -615,6 +615,14 @@ static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
>  static int cpufreq_set_policy(struct cpufreq_policy *policy,
>                                 struct cpufreq_policy *new_policy);
>  
> +static void
> +cpufreq_get_user_policy_freq(struct cpufreq_real_policy *user_policy,
> +                               struct cpufreq_policy *policy)
> +{
> +       policy->min = user_policy->min;
> +       policy->max = user_policy->max;
> +}
> +
>  /**
>   * cpufreq_per_cpu_attr_write() / store_##file_name() - sysfs write access
>   */
> @@ -622,21 +630,20 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
>  static ssize_t store_##file_name                                       \
>  (struct cpufreq_policy *policy, const char *buf, size_t count)         \
>  {                                                                      \
> -       int ret, temp;                                                  \
> +       int ret;                                                        \
>         struct cpufreq_policy new_policy;                               \
>                                                                         \
>         ret = cpufreq_get_policy(&new_policy, policy->cpu);             \
>         if (ret)                                                        \
>                 return -EINVAL;                                         \
>                                                                         \
> +       cpufreq_get_user_policy_freq(&policy->user_policy, &new_policy);\
>         ret = sscanf(buf, "%u", &new_policy.object);                    \
>         if (ret != 1)                                                   \
>                 return -EINVAL;                                         \
>                                                                         \
> -       temp = new_policy.object;                                       \
> -       ret = cpufreq_set_policy(policy, &new_policy);          \
> -       if (!ret)                                                       \
> -               policy->user_policy.object = temp;                      \
> +       policy->user_policy.object = policy->object;                    \
should be 
+	policy->user_policy.object = new_policy.object;		\
sorry for that.
> +       ret = cpufreq_set_policy(policy, &new_policy);                  \
>                                                                         \
>         return ret ? ret : count;                                       \
>  }
> 
>

> 
> 
> Thanks
> xinhui
> 
>> Thanks,
>> Rafael
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar July 29, 2015, 11:01 a.m. UTC | #2
On 29-07-15, 18:04, Pan Xinhui wrote:
> > @@ -622,21 +630,20 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
> >  static ssize_t store_##file_name                                       \
> >  (struct cpufreq_policy *policy, const char *buf, size_t count)         \
> >  {                                                                      \
> > -       int ret, temp;                                                  \
> > +       int ret;                                                        \
> >         struct cpufreq_policy new_policy;                               \
> >                                                                         \
> >         ret = cpufreq_get_policy(&new_policy, policy->cpu);             \
> >         if (ret)                                                        \
> >                 return -EINVAL;                                         \
> >                                                                         \
> > +       cpufreq_get_user_policy_freq(&policy->user_policy, &new_policy);\
> >         ret = sscanf(buf, "%u", &new_policy.object);                    \
> >         if (ret != 1)                                                   \
> >                 return -EINVAL;                                         \
> >                                                                         \
> > -       temp = new_policy.object;                                       \
> > -       ret = cpufreq_set_policy(policy, &new_policy);          \
> > -       if (!ret)                                                       \
> > -               policy->user_policy.object = temp;                      \
> > +       policy->user_policy.object = policy->object;                    \
> should be 
> +	policy->user_policy.object = new_policy.object;		\
> sorry for that.
> > +       ret = cpufreq_set_policy(policy, &new_policy);                  \

This is wrong because we save user-preference, even when we failed. So that's
surely bad.
Rafael J. Wysocki July 29, 2015, 10:40 p.m. UTC | #3
On Wednesday, July 29, 2015 05:59:14 PM Pan Xinhui wrote:
> hi, Rafael
> 	thanks for you reply.
> 
> On 2015?07?29? 08:18, Rafael J. Wysocki wrote:
> > On Tuesday, July 28, 2015 12:53:33 PM Pan Xinhui wrote:
> >> hi, Viresh
> >> 	thanks for your reply :)
> >> On 2015?07?28? 12:29, Viresh Kumar wrote:
> >>> On 28-07-15, 11:32, Pan Xinhui wrote:
> >>>> From: Pan Xinhui <xinhuix.pan@intel.com>
> >>>>
> >>>> Userspace at most time do cpufreq tests very much inconveniently.
> >>>> Currently they have to echo min and max cpu freq separately like below:
> >>>> echo 480000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
> >>>> echo 2240000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
> >>>>
> >>>> Add scaling_freq_range cpufreq attr to support userspace's demand.
> >>>> Therefore it's easier for testers to write readable scripts like below: 
> >>>> echo 480000-2240000 >
> >>>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_freq_range
> >>>
> >>> I don't think this brings any good change, we already have support for
> >>> that with min/max freqs and I don't see how scripts can be less
> >>> readable with that.
> >>>
> >> yes, min/max are supported, however it is inconvenient. sometime it's very easy to cause obscure bugs.
> >> For example, some one might write a script like below.
> >> echo 480000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
> >> echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
> >> .....//other works
> >> echo 1120000  > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
> >> echo 2240000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
> >> ...//other works
> >>
> >> But it did not work when we echo 112000 to min-freq, as the current max freq is smaller than it.
> >> It's hard to figure it out in a big script... we have many such scripts.
> > 
> > Fix them, then, pretty please.
> > 
> of course we will fix them. :)
> 
> > And adding this attribute is not going to magically fix them, is it?
> > 
> yes, this patch can not fix them without changing the script. BUT I have another patch which could magically fix them. :)
> 
> These two attribute files are very tricky. they are related with each other.
> Not like some other attribute file in other part of kernel, for example, proc/sys/fs/file-max.
> As the file-min is always zero. It's very reasonable to only support file-max attribute file.
> 
> The sequence we echoing value to min/max_freq is very important. Maybe we can also assume they have *state*.
> Just like a developer writes a buf to a file. he should do in this way below.
> fp = fopen(..)
>  => fwrite(...)
>   => fclose(...)
> 
> The script I mentioned above did not follow the right sequence. when script wants to set the min higher, we need set the max first to avoid min > max issue...
> So max/min_freq have *state*. just like TCP Three-way handshake, SYN, ACK&SYN, ACK. the sequence(this is so-called state) is very important.

No, this isn't like that.  The rule is simple: whatever is in one of the
attributes needs to be a smaller value than the one from the other attribute
at any time.  So there is a correlation between them, but the only "state" is
those numbers written to them previously (which they preserve quite as expected).

And the algorithm is: look for what's in min and write a number which is not
less then that to max.  And the other way around.

Again, please fix your scripts and don't litter the kernel with stuff which
only is needed because user space developers can't get their act together.

Case dismissed.

Thanks,
Rafael

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 8772346..00e6965 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -615,6 +615,14 @@  static ssize_t show_scaling_cur_freq(struct cpufreq_policy *policy, char *buf)
 static int cpufreq_set_policy(struct cpufreq_policy *policy,
                                struct cpufreq_policy *new_policy);
 
+static void
+cpufreq_get_user_policy_freq(struct cpufreq_real_policy *user_policy,
+                               struct cpufreq_policy *policy)
+{
+       policy->min = user_policy->min;
+       policy->max = user_policy->max;
+}
+
 /**
  * cpufreq_per_cpu_attr_write() / store_##file_name() - sysfs write access
  */
@@ -622,21 +630,20 @@  static int cpufreq_set_policy(struct cpufreq_policy *policy,
 static ssize_t store_##file_name                                       \
 (struct cpufreq_policy *policy, const char *buf, size_t count)         \
 {                                                                      \
-       int ret, temp;                                                  \
+       int ret;                                                        \
        struct cpufreq_policy new_policy;                               \
                                                                        \
        ret = cpufreq_get_policy(&new_policy, policy->cpu);             \
        if (ret)                                                        \
                return -EINVAL;                                         \
                                                                        \
+       cpufreq_get_user_policy_freq(&policy->user_policy, &new_policy);\
        ret = sscanf(buf, "%u", &new_policy.object);                    \
        if (ret != 1)                                                   \
                return -EINVAL;                                         \
                                                                        \
-       temp = new_policy.object;                                       \
-       ret = cpufreq_set_policy(policy, &new_policy);          \
-       if (!ret)                                                       \
-               policy->user_policy.object = temp;                      \
+       policy->user_policy.object = policy->object;                    \
+       ret = cpufreq_set_policy(policy, &new_policy);                  \
                                                                        \
        return ret ? ret : count;                                       \
 }