diff mbox series

[8/8] cpufreq: amd-pstate: Drop some uses of cpudata->hw_prefcore

Message ID 20240826211358.2694603-9-superm1@kernel.org (mailing list archive)
State Superseded, archived
Delegated to: Mario Limonciello
Headers show
Series Adjustments for preferred core detection | expand

Commit Message

Mario Limonciello Aug. 26, 2024, 9:13 p.m. UTC
From: Mario Limonciello <mario.limonciello@amd.com>

As the global variable is cleared when preferred cores is not present
the local variable use isn't needed in many functions.
Drop it where possible.  No intended functional changes.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/cpufreq/amd-pstate.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Yuan, Perry Aug. 27, 2024, 6:53 a.m. UTC | #1
[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Mario Limonciello <superm1@kernel.org>
> Sent: Tuesday, August 27, 2024 5:14 AM
> To: Borislav Petkov <bp@alien8.de>; Shenoy, Gautham Ranjal
> <gautham.shenoy@amd.com>; Yuan, Perry <Perry.Yuan@amd.com>
> Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT) <x86@kernel.org>;
> Rafael J . Wysocki <rafael@kernel.org>; open list:X86 ARCHITECTURE (32-BIT
> AND 64-BIT) <linux-kernel@vger.kernel.org>; open list:ACPI <linux-
> acpi@vger.kernel.org>; open list:CPU FREQUENCY SCALING FRAMEWORK
> <linux-pm@vger.kernel.org>; Limonciello, Mario
> <Mario.Limonciello@amd.com>
> Subject: [PATCH 8/8] cpufreq: amd-pstate: Drop some uses of cpudata-
> >hw_prefcore
>
> From: Mario Limonciello <mario.limonciello@amd.com>
>
> As the global variable is cleared when preferred cores is not present the local
> variable use isn't needed in many functions.
> Drop it where possible.  No intended functional changes.
>
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index ed05d7a0add10..257e28e549bd1 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -1112,12 +1112,7 @@ static ssize_t
> show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy,  static
> ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
>                                          char *buf)
>  {
> -     bool hw_prefcore;
> -     struct amd_cpudata *cpudata = policy->driver_data;
> -
> -     hw_prefcore = READ_ONCE(cpudata->hw_prefcore);
> -
> -     return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore));
> +     return sysfs_emit(buf, "%s\n",
> +str_enabled_disabled(amd_pstate_prefcore));
>  }
>
>  static ssize_t show_energy_performance_available_preferences(
> --
> 2.43.0
>

Reviewed-by: Perry Yuan <perry.yuan@amd.com>


Best Regards.

Perry.
Gautham R. Shenoy Aug. 27, 2024, 5:03 p.m. UTC | #2
On Mon, Aug 26, 2024 at 04:13:58PM -0500, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello@amd.com>
> 
> As the global variable is cleared when preferred cores is not present
> the local variable use isn't needed in many functions.
> Drop it where possible.  No intended functional changes.
> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  drivers/cpufreq/amd-pstate.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index ed05d7a0add10..257e28e549bd1 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy,
>  static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
>  					   char *buf)
>  {
> -	bool hw_prefcore;
> -	struct amd_cpudata *cpudata = policy->driver_data;
> -
> -	hw_prefcore = READ_ONCE(cpudata->hw_prefcore);
> -
> -	return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore));
> +	return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore));


If the user boots with "amd_prefcore=disable" on a system that
supports preferred-core, pror to this patchset, cpudata->hw_prefcore
would be true and thus, amd_pstate_hw_prefcore would show "enabled".

With this patchset, it would show "disabled". Or am I missing something?



>  }
>  
>  static ssize_t show_energy_performance_available_preferences(
> -- 
> 2.43.0
> 

--
Thanks and Regards
gautham.
Mario Limonciello Aug. 27, 2024, 7:16 p.m. UTC | #3
On 8/27/2024 12:03, Gautham R. Shenoy wrote:
> On Mon, Aug 26, 2024 at 04:13:58PM -0500, Mario Limonciello wrote:
>> From: Mario Limonciello <mario.limonciello@amd.com>
>>
>> As the global variable is cleared when preferred cores is not present
>> the local variable use isn't needed in many functions.
>> Drop it where possible.  No intended functional changes.
>>
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>>   drivers/cpufreq/amd-pstate.c | 7 +------
>>   1 file changed, 1 insertion(+), 6 deletions(-)
>>
>> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
>> index ed05d7a0add10..257e28e549bd1 100644
>> --- a/drivers/cpufreq/amd-pstate.c
>> +++ b/drivers/cpufreq/amd-pstate.c
>> @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy,
>>   static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
>>   					   char *buf)
>>   {
>> -	bool hw_prefcore;
>> -	struct amd_cpudata *cpudata = policy->driver_data;
>> -
>> -	hw_prefcore = READ_ONCE(cpudata->hw_prefcore);
>> -
>> -	return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore));
>> +	return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore));
> 
> 
> If the user boots with "amd_prefcore=disable" on a system that
> supports preferred-core, pror to this patchset, cpudata->hw_prefcore
> would be true and thus, amd_pstate_hw_prefcore would show "enabled".
> 

Yes; you're right about the previous behavior.

> With this patchset, it would show "disabled". Or am I missing something?

This appears to be another case we don't have documentation for the 
sysfs file `amd_pstate_hw_prefcore`.

I had thought this was a malfunction in the behavior that it reflected 
the current status, not the hardware /capability/.

Which one makes more sense for userspace?  In my mind the most likely 
consumer of this information would be something a sched_ext based 
userspace scheduler.  They would need to know whether the scheduler was 
using preferred cores; not whether the hardware supported it.

Whichever direction we agree to go; I'll add documentation for v2.

> 
> 
> 
>>   }
>>   
>>   static ssize_t show_energy_performance_available_preferences(
>> -- 
>> 2.43.0
>>
> 
> --
> Thanks and Regards
> gautham.
Gautham R. Shenoy Aug. 28, 2024, 5:08 a.m. UTC | #4
On Tue, Aug 27, 2024 at 02:16:51PM -0500, Mario Limonciello wrote:
> On 8/27/2024 12:03, Gautham R. Shenoy wrote:
> > On Mon, Aug 26, 2024 at 04:13:58PM -0500, Mario Limonciello wrote:
> > > From: Mario Limonciello <mario.limonciello@amd.com>
> > > 
> > > As the global variable is cleared when preferred cores is not present
> > > the local variable use isn't needed in many functions.
> > > Drop it where possible.  No intended functional changes.
> > > 
> > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > > ---
> > >   drivers/cpufreq/amd-pstate.c | 7 +------
> > >   1 file changed, 1 insertion(+), 6 deletions(-)
> > > 
> > > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> > > index ed05d7a0add10..257e28e549bd1 100644
> > > --- a/drivers/cpufreq/amd-pstate.c
> > > +++ b/drivers/cpufreq/amd-pstate.c
> > > @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy,
> > >   static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
> > >   					   char *buf)
> > >   {
> > > -	bool hw_prefcore;
> > > -	struct amd_cpudata *cpudata = policy->driver_data;
> > > -
> > > -	hw_prefcore = READ_ONCE(cpudata->hw_prefcore);
> > > -
> > > -	return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore));
> > > +	return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore));
> > 
> > 
> > If the user boots with "amd_prefcore=disable" on a system that
> > supports preferred-core, pror to this patchset, cpudata->hw_prefcore
> > would be true and thus, amd_pstate_hw_prefcore would show "enabled".
> > 
> 
> Yes; you're right about the previous behavior.
> 
> > With this patchset, it would show "disabled". Or am I missing something?
> 
> This appears to be another case we don't have documentation for the sysfs
> file `amd_pstate_hw_prefcore`.

:(

> 
> I had thought this was a malfunction in the behavior that it reflected the
> current status, not the hardware /capability/.
> 
> Which one makes more sense for userspace?  In my mind the most likely
> consumer of this information would be something a sched_ext based userspace
> scheduler.  They would need to know whether the scheduler was using
> preferred cores; not whether the hardware supported it.

The commandline parameter currently impacts only the fair sched-class
tasks since the preference information gets used only during
load-balancing.

IMO, the same should continue with sched-ext, i.e. if the user has
explicitly disabled prefcore support via commandline, the no sched-ext
scheduler should use the preference information to make task placement
decisions. However, I would like to see what the sched-ext folks have
to say. Adding some of them to the Cc list.

> 
> Whichever direction we agree to go; I'll add documentation for v2.

Yes please.

--
Thanks and Regards
gautham.
Andrea Righi Aug. 28, 2024, 6:20 a.m. UTC | #5
On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote:
...
> > I had thought this was a malfunction in the behavior that it reflected the
> > current status, not the hardware /capability/.
> > 
> > Which one makes more sense for userspace?  In my mind the most likely
> > consumer of this information would be something a sched_ext based userspace
> > scheduler.  They would need to know whether the scheduler was using
> > preferred cores; not whether the hardware supported it.
> 
> The commandline parameter currently impacts only the fair sched-class
> tasks since the preference information gets used only during
> load-balancing.
> 
> IMO, the same should continue with sched-ext, i.e. if the user has
> explicitly disabled prefcore support via commandline, the no sched-ext
> scheduler should use the preference information to make task placement
> decisions. However, I would like to see what the sched-ext folks have
> to say. Adding some of them to the Cc list.

IMHO it makes more sense to reflect the real state of prefcore support
from a "system" perspective, more than a "hardware" perspective, so if
it's disabled via boot command line it should show disabled.

From a user-space scheduler perspective we should be fine either way, as
long as the ABI is clearly documented, since we also have access to
/proc/cmdline and we would be able to figure out if the user has
disabled it via cmdline (however, the preference is still to report the
actual system status).

Question: having prefcore enabled affects also the value of
scaling_max_freq? Like an `lscpu -e`, for example, would show a higher
max frequency for the specific preferred cores? (this is another useful
information from a sched_ext scheduler perspective).

Thanks,
-Andrea
Gautham R. Shenoy Aug. 28, 2024, 2:57 p.m. UTC | #6
Hello Andrea,

On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote:
> On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote:
> ...
> > > I had thought this was a malfunction in the behavior that it reflected the
> > > current status, not the hardware /capability/.
> > > 
> > > Which one makes more sense for userspace?  In my mind the most likely
> > > consumer of this information would be something a sched_ext based userspace
> > > scheduler.  They would need to know whether the scheduler was using
> > > preferred cores; not whether the hardware supported it.
> > 
> > The commandline parameter currently impacts only the fair sched-class
> > tasks since the preference information gets used only during
> > load-balancing.
> > 
> > IMO, the same should continue with sched-ext, i.e. if the user has
> > explicitly disabled prefcore support via commandline, the no sched-ext
> > scheduler should use the preference information to make task placement
> > decisions. However, I would like to see what the sched-ext folks have
> > to say. Adding some of them to the Cc list.
> 
> IMHO it makes more sense to reflect the real state of prefcore support
> from a "system" perspective, more than a "hardware" perspective, so if
> it's disabled via boot command line it should show disabled.
> 
> From a user-space scheduler perspective we should be fine either way, as
> long as the ABI is clearly documented, since we also have access to
> /proc/cmdline and we would be able to figure out if the user has
> disabled it via cmdline (however, the preference is still to report the
> actual system status).

Thank you for confirming this.

> 
> Question: having prefcore enabled affects also the value of
> scaling_max_freq? Like an `lscpu -e`, for example, would show a higher
> max frequency for the specific preferred cores? (this is another useful
> information from a sched_ext scheduler perspective).

Since the scaling_max_freq is computed based on the boost-numerator,
at least from this patchset, the numerator would be the same across
all kinds of cores, and thus the scaling_max_freq reported will be the
same across all the cores.

> 
> Thanks,
> -Andrea

--
Thanks and Regards
gautham.
Andrea Righi Aug. 29, 2024, 12:52 p.m. UTC | #7
On Wed, Aug 28, 2024 at 08:27:44PM +0530, Gautham R. Shenoy wrote:
> Hello Andrea,
> 
> On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote:
> > On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote:
> > ...
> > > > I had thought this was a malfunction in the behavior that it reflected the
> > > > current status, not the hardware /capability/.
> > > > 
> > > > Which one makes more sense for userspace?  In my mind the most likely
> > > > consumer of this information would be something a sched_ext based userspace
> > > > scheduler.  They would need to know whether the scheduler was using
> > > > preferred cores; not whether the hardware supported it.
> > > 
> > > The commandline parameter currently impacts only the fair sched-class
> > > tasks since the preference information gets used only during
> > > load-balancing.
> > > 
> > > IMO, the same should continue with sched-ext, i.e. if the user has
> > > explicitly disabled prefcore support via commandline, the no sched-ext
> > > scheduler should use the preference information to make task placement
> > > decisions. However, I would like to see what the sched-ext folks have
> > > to say. Adding some of them to the Cc list.
> > 
> > IMHO it makes more sense to reflect the real state of prefcore support
> > from a "system" perspective, more than a "hardware" perspective, so if
> > it's disabled via boot command line it should show disabled.
> > 
> > From a user-space scheduler perspective we should be fine either way, as
> > long as the ABI is clearly documented, since we also have access to
> > /proc/cmdline and we would be able to figure out if the user has
> > disabled it via cmdline (however, the preference is still to report the
> > actual system status).
> 
> Thank you for confirming this.
> 
> > 
> > Question: having prefcore enabled affects also the value of
> > scaling_max_freq? Like an `lscpu -e`, for example, would show a higher
> > max frequency for the specific preferred cores? (this is another useful
> > information from a sched_ext scheduler perspective).
> 
> Since the scaling_max_freq is computed based on the boost-numerator,
> at least from this patchset, the numerator would be the same across
> all kinds of cores, and thus the scaling_max_freq reported will be the
> same across all the cores.

I see, so IIUC from user-space the most reliable way to detect the
fastest cores is to check amd_pstate_highest_perf / amd_pstate_max_freq,
right? I'm trying to figure out a way to abstract and generalize the
concept of "fast cores" in sched_ext.

Also, is this something that has changed recently? I see this on an
AMD Ryzen Threadripper PRO 7975WX 32-Cores running a 6.8 kernel:

$ uname -r
6.8.0-40-generic

$ grep . /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq
/sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu8/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu9/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu10/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu11/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu12/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu13/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu14/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu15/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu16/cpufreq/cpuinfo_max_freq:6161000
/sys/devices/system/cpu/cpu17/cpufreq/cpuinfo_max_freq:6321000
/sys/devices/system/cpu/cpu18/cpufreq/cpuinfo_max_freq:6001000
/sys/devices/system/cpu/cpu19/cpufreq/cpuinfo_max_freq:6646000
/sys/devices/system/cpu/cpu20/cpufreq/cpuinfo_max_freq:5837000
/sys/devices/system/cpu/cpu21/cpufreq/cpuinfo_max_freq:6482000
/sys/devices/system/cpu/cpu22/cpufreq/cpuinfo_max_freq:5517000
/sys/devices/system/cpu/cpu23/cpufreq/cpuinfo_max_freq:5677000
/sys/devices/system/cpu/cpu24/cpufreq/cpuinfo_max_freq:6966000
/sys/devices/system/cpu/cpu25/cpufreq/cpuinfo_max_freq:7775000
/sys/devices/system/cpu/cpu26/cpufreq/cpuinfo_max_freq:6806000
/sys/devices/system/cpu/cpu27/cpufreq/cpuinfo_max_freq:7775000
/sys/devices/system/cpu/cpu28/cpufreq/cpuinfo_max_freq:7130000
/sys/devices/system/cpu/cpu29/cpufreq/cpuinfo_max_freq:7451000
/sys/devices/system/cpu/cpu30/cpufreq/cpuinfo_max_freq:7290000
/sys/devices/system/cpu/cpu31/cpufreq/cpuinfo_max_freq:7611000
/sys/devices/system/cpu/cpu32/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu33/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu34/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu35/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu36/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu37/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu38/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu39/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu40/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu41/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu42/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu43/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu44/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu45/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu46/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu47/cpufreq/cpuinfo_max_freq:5352000
/sys/devices/system/cpu/cpu48/cpufreq/cpuinfo_max_freq:6161000
/sys/devices/system/cpu/cpu49/cpufreq/cpuinfo_max_freq:6321000
/sys/devices/system/cpu/cpu50/cpufreq/cpuinfo_max_freq:6001000
/sys/devices/system/cpu/cpu51/cpufreq/cpuinfo_max_freq:6646000
/sys/devices/system/cpu/cpu52/cpufreq/cpuinfo_max_freq:5837000
/sys/devices/system/cpu/cpu53/cpufreq/cpuinfo_max_freq:6482000
/sys/devices/system/cpu/cpu54/cpufreq/cpuinfo_max_freq:5517000
/sys/devices/system/cpu/cpu55/cpufreq/cpuinfo_max_freq:5677000
/sys/devices/system/cpu/cpu56/cpufreq/cpuinfo_max_freq:6966000
/sys/devices/system/cpu/cpu57/cpufreq/cpuinfo_max_freq:7775000
/sys/devices/system/cpu/cpu58/cpufreq/cpuinfo_max_freq:6806000
/sys/devices/system/cpu/cpu59/cpufreq/cpuinfo_max_freq:7775000
/sys/devices/system/cpu/cpu60/cpufreq/cpuinfo_max_freq:7130000
/sys/devices/system/cpu/cpu61/cpufreq/cpuinfo_max_freq:7451000
/sys/devices/system/cpu/cpu62/cpufreq/cpuinfo_max_freq:7290000
/sys/devices/system/cpu/cpu63/cpufreq/cpuinfo_max_freq:7611000

Thanks,
-Andrea
Mario Limonciello Aug. 29, 2024, 1:01 p.m. UTC | #8
On 8/29/2024 07:52, Andrea Righi wrote:
> On Wed, Aug 28, 2024 at 08:27:44PM +0530, Gautham R. Shenoy wrote:
>> Hello Andrea,
>>
>> On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote:
>>> On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote:
>>> ...
>>>>> I had thought this was a malfunction in the behavior that it reflected the
>>>>> current status, not the hardware /capability/.
>>>>>
>>>>> Which one makes more sense for userspace?  In my mind the most likely
>>>>> consumer of this information would be something a sched_ext based userspace
>>>>> scheduler.  They would need to know whether the scheduler was using
>>>>> preferred cores; not whether the hardware supported it.
>>>>
>>>> The commandline parameter currently impacts only the fair sched-class
>>>> tasks since the preference information gets used only during
>>>> load-balancing.
>>>>
>>>> IMO, the same should continue with sched-ext, i.e. if the user has
>>>> explicitly disabled prefcore support via commandline, the no sched-ext
>>>> scheduler should use the preference information to make task placement
>>>> decisions. However, I would like to see what the sched-ext folks have
>>>> to say. Adding some of them to the Cc list.
>>>
>>> IMHO it makes more sense to reflect the real state of prefcore support
>>> from a "system" perspective, more than a "hardware" perspective, so if
>>> it's disabled via boot command line it should show disabled.
>>>
>>>  From a user-space scheduler perspective we should be fine either way, as
>>> long as the ABI is clearly documented, since we also have access to
>>> /proc/cmdline and we would be able to figure out if the user has
>>> disabled it via cmdline (however, the preference is still to report the
>>> actual system status).
>>
>> Thank you for confirming this.
>>
>>>
>>> Question: having prefcore enabled affects also the value of
>>> scaling_max_freq? Like an `lscpu -e`, for example, would show a higher
>>> max frequency for the specific preferred cores? (this is another useful
>>> information from a sched_ext scheduler perspective).
>>
>> Since the scaling_max_freq is computed based on the boost-numerator,
>> at least from this patchset, the numerator would be the same across
>> all kinds of cores, and thus the scaling_max_freq reported will be the
>> same across all the cores.
> 
> I see, so IIUC from user-space the most reliable way to detect the
> fastest cores is to check amd_pstate_highest_perf / amd_pstate_max_freq,
> right? I'm trying to figure out a way to abstract and generalize the
> concept of "fast cores" in sched_ext.

Right now the best way to do this is to look at the 
amd_pstate_precore_ranking file.

In this series there has been some discussion of dropping it though in 
favor of looking at the highest perf file.  I don't believe we're 
concluded one way or another on it yet though.

> 
> Also, is this something that has changed recently? I see this on an
> AMD Ryzen Threadripper PRO 7975WX 32-Cores running a 6.8 kernel:
> 
> $ uname -r
> 6.8.0-40-generic

You're missing the preferred core patches on this kernel.  They landed 
in 6.9, it's better to upgrade to 6.10.y or 6.11-rc.
Andrea Righi Aug. 29, 2024, 3:16 p.m. UTC | #9
On Thu, Aug 29, 2024 at 08:01:48AM -0500, Mario Limonciello wrote:
> On 8/29/2024 07:52, Andrea Righi wrote:
> > On Wed, Aug 28, 2024 at 08:27:44PM +0530, Gautham R. Shenoy wrote:
> > > Hello Andrea,
> > > 
> > > On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote:
> > > > On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote:
> > > > ...
> > > > > > I had thought this was a malfunction in the behavior that it reflected the
> > > > > > current status, not the hardware /capability/.
> > > > > > 
> > > > > > Which one makes more sense for userspace?  In my mind the most likely
> > > > > > consumer of this information would be something a sched_ext based userspace
> > > > > > scheduler.  They would need to know whether the scheduler was using
> > > > > > preferred cores; not whether the hardware supported it.
> > > > > 
> > > > > The commandline parameter currently impacts only the fair sched-class
> > > > > tasks since the preference information gets used only during
> > > > > load-balancing.
> > > > > 
> > > > > IMO, the same should continue with sched-ext, i.e. if the user has
> > > > > explicitly disabled prefcore support via commandline, the no sched-ext
> > > > > scheduler should use the preference information to make task placement
> > > > > decisions. However, I would like to see what the sched-ext folks have
> > > > > to say. Adding some of them to the Cc list.
> > > > 
> > > > IMHO it makes more sense to reflect the real state of prefcore support
> > > > from a "system" perspective, more than a "hardware" perspective, so if
> > > > it's disabled via boot command line it should show disabled.
> > > > 
> > > >  From a user-space scheduler perspective we should be fine either way, as
> > > > long as the ABI is clearly documented, since we also have access to
> > > > /proc/cmdline and we would be able to figure out if the user has
> > > > disabled it via cmdline (however, the preference is still to report the
> > > > actual system status).
> > > 
> > > Thank you for confirming this.
> > > 
> > > > 
> > > > Question: having prefcore enabled affects also the value of
> > > > scaling_max_freq? Like an `lscpu -e`, for example, would show a higher
> > > > max frequency for the specific preferred cores? (this is another useful
> > > > information from a sched_ext scheduler perspective).
> > > 
> > > Since the scaling_max_freq is computed based on the boost-numerator,
> > > at least from this patchset, the numerator would be the same across
> > > all kinds of cores, and thus the scaling_max_freq reported will be the
> > > same across all the cores.
> > 
> > I see, so IIUC from user-space the most reliable way to detect the
> > fastest cores is to check amd_pstate_highest_perf / amd_pstate_max_freq,
> > right? I'm trying to figure out a way to abstract and generalize the
> > concept of "fast cores" in sched_ext.
> 
> Right now the best way to do this is to look at the
> amd_pstate_precore_ranking file.

Ok.

> 
> In this series there has been some discussion of dropping it though in favor
> of looking at the highest perf file.  I don't believe we're concluded one
> way or another on it yet though.
> 
> > 
> > Also, is this something that has changed recently? I see this on an
> > AMD Ryzen Threadripper PRO 7975WX 32-Cores running a 6.8 kernel:
> > 
> > $ uname -r
> > 6.8.0-40-generic
> 
> You're missing the preferred core patches on this kernel.  They landed in
> 6.9, it's better to upgrade to 6.10.y or 6.11-rc.

So, if I move to 6.9+ I should see the same max frequency across all the
CPUs and I can use amd_pstate_precore_ranking to determine the subset of
fast cores.

Thanks for the clarification.

-Andrea
diff mbox series

Patch

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index ed05d7a0add10..257e28e549bd1 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -1112,12 +1112,7 @@  static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy,
 static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy,
 					   char *buf)
 {
-	bool hw_prefcore;
-	struct amd_cpudata *cpudata = policy->driver_data;
-
-	hw_prefcore = READ_ONCE(cpudata->hw_prefcore);
-
-	return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore));
+	return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore));
 }
 
 static ssize_t show_energy_performance_available_preferences(