diff mbox

[4/5] intel_pstate: skip scheduler hook when in "performance" mode.

Message ID 1ecfadf4193fbd8873d4bcda9e9faec5825bcafb.1496889051.git.len.brown@intel.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Len Brown June 8, 2017, 2:39 a.m. UTC
From: Len Brown <len.brown@intel.com>

When the governor is set to "performance", intel_pstate does not
need the scheduler hook for doing any calculations.  Under these
conditions, its only purpose is to continue to maintain
cpufreq/scaling_cur_freq.

But the cpufreq/scaling_cur_freq sysfs attribute is now provided by
the x86 cpufreq core on all modern x86 systems, including
all systems supported by the intel_pstate driver.

So in "performance" governor mode, the scheduler hook can be skipped.
This applies to both in Software and Hardware P-state control modes.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
---
 drivers/cpufreq/intel_pstate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki June 17, 2017, 12:04 a.m. UTC | #1
On Wednesday, June 07, 2017 07:39:15 PM Len Brown wrote:
> From: Len Brown <len.brown@intel.com>
> 
> When the governor is set to "performance", intel_pstate does not
> need the scheduler hook for doing any calculations.  Under these
> conditions, its only purpose is to continue to maintain
> cpufreq/scaling_cur_freq.
> 
> But the cpufreq/scaling_cur_freq sysfs attribute is now provided by
> the x86 cpufreq core on all modern x86 systems, including
> all systems supported by the intel_pstate driver.
> 
> So in "performance" governor mode, the scheduler hook can be skipped.
> This applies to both in Software and Hardware P-state control modes.
> 
> Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Len Brown <len.brown@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index 5d67780..0ff3a4b 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2025,10 +2025,10 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>  		 */
>  		intel_pstate_clear_update_util_hook(policy->cpu);

The statement above shouldn't be necessary any more after the change below.

>  		intel_pstate_max_within_limits(cpu);
> +	} else {
> +		intel_pstate_set_update_util_hook(policy->cpu);
>  	}
>  
> -	intel_pstate_set_update_util_hook(policy->cpu);
> -
>  	if (hwp_active)
>  		intel_pstate_hwp_set(policy->cpu);
>  
> 

What about update_turbo_pstate()?

In theory MSR_IA32_MISC_ENABLE_TURBO_DISABLE can be set at any time, so
wouldn't that become problematic after this change?

Thanks,
Rafael
Len Brown June 17, 2017, 12:52 a.m. UTC | #2
On Fri, Jun 16, 2017 at 8:04 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, June 07, 2017 07:39:15 PM Len Brown wrote:
>> From: Len Brown <len.brown@intel.com>
>>
>> When the governor is set to "performance", intel_pstate does not
>> need the scheduler hook for doing any calculations.  Under these
>> conditions, its only purpose is to continue to maintain
>> cpufreq/scaling_cur_freq.
>>
>> But the cpufreq/scaling_cur_freq sysfs attribute is now provided by
>> the x86 cpufreq core on all modern x86 systems, including
>> all systems supported by the intel_pstate driver.
>>
>> So in "performance" governor mode, the scheduler hook can be skipped.
>> This applies to both in Software and Hardware P-state control modes.
>>
>> Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> Signed-off-by: Len Brown <len.brown@intel.com>
>> ---
>>  drivers/cpufreq/intel_pstate.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> index 5d67780..0ff3a4b 100644
>> --- a/drivers/cpufreq/intel_pstate.c
>> +++ b/drivers/cpufreq/intel_pstate.c
>> @@ -2025,10 +2025,10 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>>                */
>>               intel_pstate_clear_update_util_hook(policy->cpu);
>
> The statement above shouldn't be necessary any more after the change below.

The policy can change at run time form something other than  performance
to performance, so we want to clear the hook in that case, no?

>>               intel_pstate_max_within_limits(cpu);
>> +     } else {
>> +             intel_pstate_set_update_util_hook(policy->cpu);
>>       }
>>
>> -     intel_pstate_set_update_util_hook(policy->cpu);
>> -
>>       if (hwp_active)
>>               intel_pstate_hwp_set(policy->cpu);
>>
>
> What about update_turbo_pstate()?
>
> In theory MSR_IA32_MISC_ENABLE_TURBO_DISABLE can be set at any time, so
> wouldn't that become problematic after this change?

yes, the sysfs "no_turbo" attribute can be modified at any time, invoking
update_turbo_state(), which will update MSR_IA32_MISC_ENABLE_TURBO_DISABLE

But how is the presence or change in turbo related to the lack of a
need to hook the scheduler callback in "performance" mode?  The hook
literally does nothing in this case, except consume cycles, no?
Rafael J. Wysocki June 17, 2017, 1:06 a.m. UTC | #3
On Friday, June 16, 2017 08:52:53 PM Len Brown wrote:
> On Fri, Jun 16, 2017 at 8:04 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, June 07, 2017 07:39:15 PM Len Brown wrote:
> >> From: Len Brown <len.brown@intel.com>
> >>
> >> When the governor is set to "performance", intel_pstate does not
> >> need the scheduler hook for doing any calculations.  Under these
> >> conditions, its only purpose is to continue to maintain
> >> cpufreq/scaling_cur_freq.
> >>
> >> But the cpufreq/scaling_cur_freq sysfs attribute is now provided by
> >> the x86 cpufreq core on all modern x86 systems, including
> >> all systems supported by the intel_pstate driver.
> >>
> >> So in "performance" governor mode, the scheduler hook can be skipped.
> >> This applies to both in Software and Hardware P-state control modes.
> >>
> >> Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> >> Signed-off-by: Len Brown <len.brown@intel.com>
> >> ---
> >>  drivers/cpufreq/intel_pstate.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> >> index 5d67780..0ff3a4b 100644
> >> --- a/drivers/cpufreq/intel_pstate.c
> >> +++ b/drivers/cpufreq/intel_pstate.c
> >> @@ -2025,10 +2025,10 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
> >>                */
> >>               intel_pstate_clear_update_util_hook(policy->cpu);
> >
> > The statement above shouldn't be necessary any more after the change below.
> 
> The policy can change at run time form something other than  performance
> to performance, so we want to clear the hook in that case, no?

Yes.

> >>               intel_pstate_max_within_limits(cpu);
> >> +     } else {
> >> +             intel_pstate_set_update_util_hook(policy->cpu);
> >>       }
> >>
> >> -     intel_pstate_set_update_util_hook(policy->cpu);
> >> -
> >>       if (hwp_active)
> >>               intel_pstate_hwp_set(policy->cpu);
> >>
> >
> > What about update_turbo_pstate()?
> >
> > In theory MSR_IA32_MISC_ENABLE_TURBO_DISABLE can be set at any time, so
> > wouldn't that become problematic after this change?
> 
> yes, the sysfs "no_turbo" attribute can be modified at any time, invoking
> update_turbo_state(), which will update MSR_IA32_MISC_ENABLE_TURBO_DISABLE

If that was the only way it could change, I wouldn't worry about it, but what
about changes by BMCs and similar?  Are they not a concern?

> But how is the presence or change in turbo related to the lack of a
> need to hook the scheduler callback in "performance" mode?  The hook
> literally does nothing in this case, except consume cycles, no?

No.

It actually sets the P-state to the current maximum (which admittedly is
excessive) exactly because the maximum may change on the fly in theory.

If it can't change on the fly (or we don't care), we can do some more
simplifications there. :-)

Thanks,
Rafael
Len Brown June 17, 2017, 1:31 a.m. UTC | #4
On Fri, Jun 16, 2017 at 9:06 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Friday, June 16, 2017 08:52:53 PM Len Brown wrote:
>> On Fri, Jun 16, 2017 at 8:04 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> > On Wednesday, June 07, 2017 07:39:15 PM Len Brown wrote:
>> >> From: Len Brown <len.brown@intel.com>
>> >>
>> >> When the governor is set to "performance", intel_pstate does not
>> >> need the scheduler hook for doing any calculations.  Under these
>> >> conditions, its only purpose is to continue to maintain
>> >> cpufreq/scaling_cur_freq.
>> >>
>> >> But the cpufreq/scaling_cur_freq sysfs attribute is now provided by
>> >> the x86 cpufreq core on all modern x86 systems, including
>> >> all systems supported by the intel_pstate driver.
>> >>
>> >> So in "performance" governor mode, the scheduler hook can be skipped.
>> >> This applies to both in Software and Hardware P-state control modes.
>> >>
>> >> Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
>> >> Signed-off-by: Len Brown <len.brown@intel.com>
>> >> ---
>> >>  drivers/cpufreq/intel_pstate.c | 4 ++--
>> >>  1 file changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
>> >> index 5d67780..0ff3a4b 100644
>> >> --- a/drivers/cpufreq/intel_pstate.c
>> >> +++ b/drivers/cpufreq/intel_pstate.c
>> >> @@ -2025,10 +2025,10 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy)
>> >>                */
>> >>               intel_pstate_clear_update_util_hook(policy->cpu);
>> >
>> > The statement above shouldn't be necessary any more after the change below.
>>
>> The policy can change at run time form something other than  performance
>> to performance, so we want to clear the hook in that case, no?
>
> Yes.
>
>> >>               intel_pstate_max_within_limits(cpu);
>> >> +     } else {
>> >> +             intel_pstate_set_update_util_hook(policy->cpu);
>> >>       }
>> >>
>> >> -     intel_pstate_set_update_util_hook(policy->cpu);
>> >> -
>> >>       if (hwp_active)
>> >>               intel_pstate_hwp_set(policy->cpu);
>> >>
>> >
>> > What about update_turbo_pstate()?
>> >
>> > In theory MSR_IA32_MISC_ENABLE_TURBO_DISABLE can be set at any time, so
>> > wouldn't that become problematic after this change?
>>
>> yes, the sysfs "no_turbo" attribute can be modified at any time, invoking
>> update_turbo_state(), which will update MSR_IA32_MISC_ENABLE_TURBO_DISABLE
>
> If that was the only way it could change, I wouldn't worry about it, but what
> about changes by BMCs and similar?  Are they not a concern?
>
>> But how is the presence or change in turbo related to the lack of a
>> need to hook the scheduler callback in "performance" mode?  The hook
>> literally does nothing in this case, except consume cycles, no?
>
> No.
>
> It actually sets the P-state to the current maximum (which admittedly is
> excessive) exactly because the maximum may change on the fly in theory.

There are 2 cases.

If turbo was enabled and were we requesting max turbo
and "somebody" disabled turbo in an MSR, then the HW would
simply clip our excessive req
> If it can't change on the fly (or we don't care), we can do some more
> simplifications there. :-)

I do not think it is Linux's responsibility to monitor changes to MSRs
such as Turbo enable/disable done behind its back by a BMC at run-time.
(if this is even possible)
Len Brown June 17, 2017, 1:35 a.m. UTC | #5
sorry, that was a premature send...

>>> > What about update_turbo_pstate()?
>>> >
>>> > In theory MSR_IA32_MISC_ENABLE_TURBO_DISABLE can be set at any time, so
>>> > wouldn't that become problematic after this change?
>>>
>>> yes, the sysfs "no_turbo" attribute can be modified at any time, invoking
>>> update_turbo_state(), which will update MSR_IA32_MISC_ENABLE_TURBO_DISABLE
>>
>> If that was the only way it could change, I wouldn't worry about it, but what
>> about changes by BMCs and similar?  Are they not a concern?
>>
>>> But how is the presence or change in turbo related to the lack of a
>>> need to hook the scheduler callback in "performance" mode?  The hook
>>> literally does nothing in this case, except consume cycles, no?
>>
>> No.
>>
>> It actually sets the P-state to the current maximum (which admittedly is
>> excessive) exactly because the maximum may change on the fly in theory.
>
> There are 2 cases.
>
> If turbo was enabled and were we requesting max turbo
> and "somebody" disabled turbo in an MSR, then the HW would
> simply clip our excessive
request to what the hardware supports.

if turbo was disabled and we were requesting max non-turbo,
and "somebody" enabled turbo in the MSR,
then it is highly likely that current request is already sufficiently
high enough
to enable turbo.  "highly likely" = would work on 100% of the
machines I have ever seen, including those with mis-configured TAR.

ie. it should not matter.

>> If it can't change on the fly (or we don't care), we can do some more
>> simplifications there. :-)
>
> I do not think it is Linux's responsibility to monitor changes to MSRs
> such as Turbo enable/disable done behind its back by a BMC at run-time.
> (if this is even possible)
>
>
> --
> Len Brown, Intel Open Source Technology Center
diff mbox

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index 5d67780..0ff3a4b 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2025,10 +2025,10 @@  static int intel_pstate_set_policy(struct cpufreq_policy *policy)
 		 */
 		intel_pstate_clear_update_util_hook(policy->cpu);
 		intel_pstate_max_within_limits(cpu);
+	} else {
+		intel_pstate_set_update_util_hook(policy->cpu);
 	}
 
-	intel_pstate_set_update_util_hook(policy->cpu);
-
 	if (hwp_active)
 		intel_pstate_hwp_set(policy->cpu);