Message ID | 20240826211358.2694603-9-superm1@kernel.org (mailing list archive) |
---|---|
State | Superseded, archived |
Delegated to: | Mario Limonciello |
Headers | show |
Series | Adjustments for preferred core detection | expand |
[AMD Official Use Only - AMD Internal Distribution Only] > -----Original Message----- > From: Mario Limonciello <superm1@kernel.org> > Sent: Tuesday, August 27, 2024 5:14 AM > To: Borislav Petkov <bp@alien8.de>; Shenoy, Gautham Ranjal > <gautham.shenoy@amd.com>; Yuan, Perry <Perry.Yuan@amd.com> > Cc: maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT) <x86@kernel.org>; > Rafael J . Wysocki <rafael@kernel.org>; open list:X86 ARCHITECTURE (32-BIT > AND 64-BIT) <linux-kernel@vger.kernel.org>; open list:ACPI <linux- > acpi@vger.kernel.org>; open list:CPU FREQUENCY SCALING FRAMEWORK > <linux-pm@vger.kernel.org>; Limonciello, Mario > <Mario.Limonciello@amd.com> > Subject: [PATCH 8/8] cpufreq: amd-pstate: Drop some uses of cpudata- > >hw_prefcore > > From: Mario Limonciello <mario.limonciello@amd.com> > > As the global variable is cleared when preferred cores is not present the local > variable use isn't needed in many functions. > Drop it where possible. No intended functional changes. > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/cpufreq/amd-pstate.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c > index ed05d7a0add10..257e28e549bd1 100644 > --- a/drivers/cpufreq/amd-pstate.c > +++ b/drivers/cpufreq/amd-pstate.c > @@ -1112,12 +1112,7 @@ static ssize_t > show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, static > ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, > char *buf) > { > - bool hw_prefcore; > - struct amd_cpudata *cpudata = policy->driver_data; > - > - hw_prefcore = READ_ONCE(cpudata->hw_prefcore); > - > - return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore)); > + return sysfs_emit(buf, "%s\n", > +str_enabled_disabled(amd_pstate_prefcore)); > } > > static ssize_t show_energy_performance_available_preferences( > -- > 2.43.0 > Reviewed-by: Perry Yuan <perry.yuan@amd.com> Best Regards. Perry.
On Mon, Aug 26, 2024 at 04:13:58PM -0500, Mario Limonciello wrote: > From: Mario Limonciello <mario.limonciello@amd.com> > > As the global variable is cleared when preferred cores is not present > the local variable use isn't needed in many functions. > Drop it where possible. No intended functional changes. > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > --- > drivers/cpufreq/amd-pstate.c | 7 +------ > 1 file changed, 1 insertion(+), 6 deletions(-) > > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c > index ed05d7a0add10..257e28e549bd1 100644 > --- a/drivers/cpufreq/amd-pstate.c > +++ b/drivers/cpufreq/amd-pstate.c > @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, > static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, > char *buf) > { > - bool hw_prefcore; > - struct amd_cpudata *cpudata = policy->driver_data; > - > - hw_prefcore = READ_ONCE(cpudata->hw_prefcore); > - > - return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore)); > + return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore)); If the user boots with "amd_prefcore=disable" on a system that supports preferred-core, pror to this patchset, cpudata->hw_prefcore would be true and thus, amd_pstate_hw_prefcore would show "enabled". With this patchset, it would show "disabled". Or am I missing something? > } > > static ssize_t show_energy_performance_available_preferences( > -- > 2.43.0 > -- Thanks and Regards gautham.
On 8/27/2024 12:03, Gautham R. Shenoy wrote: > On Mon, Aug 26, 2024 at 04:13:58PM -0500, Mario Limonciello wrote: >> From: Mario Limonciello <mario.limonciello@amd.com> >> >> As the global variable is cleared when preferred cores is not present >> the local variable use isn't needed in many functions. >> Drop it where possible. No intended functional changes. >> >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> >> --- >> drivers/cpufreq/amd-pstate.c | 7 +------ >> 1 file changed, 1 insertion(+), 6 deletions(-) >> >> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c >> index ed05d7a0add10..257e28e549bd1 100644 >> --- a/drivers/cpufreq/amd-pstate.c >> +++ b/drivers/cpufreq/amd-pstate.c >> @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, >> static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, >> char *buf) >> { >> - bool hw_prefcore; >> - struct amd_cpudata *cpudata = policy->driver_data; >> - >> - hw_prefcore = READ_ONCE(cpudata->hw_prefcore); >> - >> - return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore)); >> + return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore)); > > > If the user boots with "amd_prefcore=disable" on a system that > supports preferred-core, pror to this patchset, cpudata->hw_prefcore > would be true and thus, amd_pstate_hw_prefcore would show "enabled". > Yes; you're right about the previous behavior. > With this patchset, it would show "disabled". Or am I missing something? This appears to be another case we don't have documentation for the sysfs file `amd_pstate_hw_prefcore`. I had thought this was a malfunction in the behavior that it reflected the current status, not the hardware /capability/. Which one makes more sense for userspace? In my mind the most likely consumer of this information would be something a sched_ext based userspace scheduler. They would need to know whether the scheduler was using preferred cores; not whether the hardware supported it. Whichever direction we agree to go; I'll add documentation for v2. > > > >> } >> >> static ssize_t show_energy_performance_available_preferences( >> -- >> 2.43.0 >> > > -- > Thanks and Regards > gautham.
On Tue, Aug 27, 2024 at 02:16:51PM -0500, Mario Limonciello wrote: > On 8/27/2024 12:03, Gautham R. Shenoy wrote: > > On Mon, Aug 26, 2024 at 04:13:58PM -0500, Mario Limonciello wrote: > > > From: Mario Limonciello <mario.limonciello@amd.com> > > > > > > As the global variable is cleared when preferred cores is not present > > > the local variable use isn't needed in many functions. > > > Drop it where possible. No intended functional changes. > > > > > > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> > > > --- > > > drivers/cpufreq/amd-pstate.c | 7 +------ > > > 1 file changed, 1 insertion(+), 6 deletions(-) > > > > > > diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c > > > index ed05d7a0add10..257e28e549bd1 100644 > > > --- a/drivers/cpufreq/amd-pstate.c > > > +++ b/drivers/cpufreq/amd-pstate.c > > > @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, > > > static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, > > > char *buf) > > > { > > > - bool hw_prefcore; > > > - struct amd_cpudata *cpudata = policy->driver_data; > > > - > > > - hw_prefcore = READ_ONCE(cpudata->hw_prefcore); > > > - > > > - return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore)); > > > + return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore)); > > > > > > If the user boots with "amd_prefcore=disable" on a system that > > supports preferred-core, pror to this patchset, cpudata->hw_prefcore > > would be true and thus, amd_pstate_hw_prefcore would show "enabled". > > > > Yes; you're right about the previous behavior. > > > With this patchset, it would show "disabled". Or am I missing something? > > This appears to be another case we don't have documentation for the sysfs > file `amd_pstate_hw_prefcore`. :( > > I had thought this was a malfunction in the behavior that it reflected the > current status, not the hardware /capability/. > > Which one makes more sense for userspace? In my mind the most likely > consumer of this information would be something a sched_ext based userspace > scheduler. They would need to know whether the scheduler was using > preferred cores; not whether the hardware supported it. The commandline parameter currently impacts only the fair sched-class tasks since the preference information gets used only during load-balancing. IMO, the same should continue with sched-ext, i.e. if the user has explicitly disabled prefcore support via commandline, the no sched-ext scheduler should use the preference information to make task placement decisions. However, I would like to see what the sched-ext folks have to say. Adding some of them to the Cc list. > > Whichever direction we agree to go; I'll add documentation for v2. Yes please. -- Thanks and Regards gautham.
On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote: ... > > I had thought this was a malfunction in the behavior that it reflected the > > current status, not the hardware /capability/. > > > > Which one makes more sense for userspace? In my mind the most likely > > consumer of this information would be something a sched_ext based userspace > > scheduler. They would need to know whether the scheduler was using > > preferred cores; not whether the hardware supported it. > > The commandline parameter currently impacts only the fair sched-class > tasks since the preference information gets used only during > load-balancing. > > IMO, the same should continue with sched-ext, i.e. if the user has > explicitly disabled prefcore support via commandline, the no sched-ext > scheduler should use the preference information to make task placement > decisions. However, I would like to see what the sched-ext folks have > to say. Adding some of them to the Cc list. IMHO it makes more sense to reflect the real state of prefcore support from a "system" perspective, more than a "hardware" perspective, so if it's disabled via boot command line it should show disabled. From a user-space scheduler perspective we should be fine either way, as long as the ABI is clearly documented, since we also have access to /proc/cmdline and we would be able to figure out if the user has disabled it via cmdline (however, the preference is still to report the actual system status). Question: having prefcore enabled affects also the value of scaling_max_freq? Like an `lscpu -e`, for example, would show a higher max frequency for the specific preferred cores? (this is another useful information from a sched_ext scheduler perspective). Thanks, -Andrea
Hello Andrea, On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote: > On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote: > ... > > > I had thought this was a malfunction in the behavior that it reflected the > > > current status, not the hardware /capability/. > > > > > > Which one makes more sense for userspace? In my mind the most likely > > > consumer of this information would be something a sched_ext based userspace > > > scheduler. They would need to know whether the scheduler was using > > > preferred cores; not whether the hardware supported it. > > > > The commandline parameter currently impacts only the fair sched-class > > tasks since the preference information gets used only during > > load-balancing. > > > > IMO, the same should continue with sched-ext, i.e. if the user has > > explicitly disabled prefcore support via commandline, the no sched-ext > > scheduler should use the preference information to make task placement > > decisions. However, I would like to see what the sched-ext folks have > > to say. Adding some of them to the Cc list. > > IMHO it makes more sense to reflect the real state of prefcore support > from a "system" perspective, more than a "hardware" perspective, so if > it's disabled via boot command line it should show disabled. > > From a user-space scheduler perspective we should be fine either way, as > long as the ABI is clearly documented, since we also have access to > /proc/cmdline and we would be able to figure out if the user has > disabled it via cmdline (however, the preference is still to report the > actual system status). Thank you for confirming this. > > Question: having prefcore enabled affects also the value of > scaling_max_freq? Like an `lscpu -e`, for example, would show a higher > max frequency for the specific preferred cores? (this is another useful > information from a sched_ext scheduler perspective). Since the scaling_max_freq is computed based on the boost-numerator, at least from this patchset, the numerator would be the same across all kinds of cores, and thus the scaling_max_freq reported will be the same across all the cores. > > Thanks, > -Andrea -- Thanks and Regards gautham.
On Wed, Aug 28, 2024 at 08:27:44PM +0530, Gautham R. Shenoy wrote: > Hello Andrea, > > On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote: > > On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote: > > ... > > > > I had thought this was a malfunction in the behavior that it reflected the > > > > current status, not the hardware /capability/. > > > > > > > > Which one makes more sense for userspace? In my mind the most likely > > > > consumer of this information would be something a sched_ext based userspace > > > > scheduler. They would need to know whether the scheduler was using > > > > preferred cores; not whether the hardware supported it. > > > > > > The commandline parameter currently impacts only the fair sched-class > > > tasks since the preference information gets used only during > > > load-balancing. > > > > > > IMO, the same should continue with sched-ext, i.e. if the user has > > > explicitly disabled prefcore support via commandline, the no sched-ext > > > scheduler should use the preference information to make task placement > > > decisions. However, I would like to see what the sched-ext folks have > > > to say. Adding some of them to the Cc list. > > > > IMHO it makes more sense to reflect the real state of prefcore support > > from a "system" perspective, more than a "hardware" perspective, so if > > it's disabled via boot command line it should show disabled. > > > > From a user-space scheduler perspective we should be fine either way, as > > long as the ABI is clearly documented, since we also have access to > > /proc/cmdline and we would be able to figure out if the user has > > disabled it via cmdline (however, the preference is still to report the > > actual system status). > > Thank you for confirming this. > > > > > Question: having prefcore enabled affects also the value of > > scaling_max_freq? Like an `lscpu -e`, for example, would show a higher > > max frequency for the specific preferred cores? (this is another useful > > information from a sched_ext scheduler perspective). > > Since the scaling_max_freq is computed based on the boost-numerator, > at least from this patchset, the numerator would be the same across > all kinds of cores, and thus the scaling_max_freq reported will be the > same across all the cores. I see, so IIUC from user-space the most reliable way to detect the fastest cores is to check amd_pstate_highest_perf / amd_pstate_max_freq, right? I'm trying to figure out a way to abstract and generalize the concept of "fast cores" in sched_ext. Also, is this something that has changed recently? I see this on an AMD Ryzen Threadripper PRO 7975WX 32-Cores running a 6.8 kernel: $ uname -r 6.8.0-40-generic $ grep . /sys/devices/system/cpu/cpu*/cpufreq/cpuinfo_max_freq /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu1/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu2/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu3/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu4/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu5/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu6/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu7/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu8/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu9/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu10/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu11/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu12/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu13/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu14/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu15/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu16/cpufreq/cpuinfo_max_freq:6161000 /sys/devices/system/cpu/cpu17/cpufreq/cpuinfo_max_freq:6321000 /sys/devices/system/cpu/cpu18/cpufreq/cpuinfo_max_freq:6001000 /sys/devices/system/cpu/cpu19/cpufreq/cpuinfo_max_freq:6646000 /sys/devices/system/cpu/cpu20/cpufreq/cpuinfo_max_freq:5837000 /sys/devices/system/cpu/cpu21/cpufreq/cpuinfo_max_freq:6482000 /sys/devices/system/cpu/cpu22/cpufreq/cpuinfo_max_freq:5517000 /sys/devices/system/cpu/cpu23/cpufreq/cpuinfo_max_freq:5677000 /sys/devices/system/cpu/cpu24/cpufreq/cpuinfo_max_freq:6966000 /sys/devices/system/cpu/cpu25/cpufreq/cpuinfo_max_freq:7775000 /sys/devices/system/cpu/cpu26/cpufreq/cpuinfo_max_freq:6806000 /sys/devices/system/cpu/cpu27/cpufreq/cpuinfo_max_freq:7775000 /sys/devices/system/cpu/cpu28/cpufreq/cpuinfo_max_freq:7130000 /sys/devices/system/cpu/cpu29/cpufreq/cpuinfo_max_freq:7451000 /sys/devices/system/cpu/cpu30/cpufreq/cpuinfo_max_freq:7290000 /sys/devices/system/cpu/cpu31/cpufreq/cpuinfo_max_freq:7611000 /sys/devices/system/cpu/cpu32/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu33/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu34/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu35/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu36/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu37/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu38/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu39/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu40/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu41/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu42/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu43/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu44/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu45/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu46/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu47/cpufreq/cpuinfo_max_freq:5352000 /sys/devices/system/cpu/cpu48/cpufreq/cpuinfo_max_freq:6161000 /sys/devices/system/cpu/cpu49/cpufreq/cpuinfo_max_freq:6321000 /sys/devices/system/cpu/cpu50/cpufreq/cpuinfo_max_freq:6001000 /sys/devices/system/cpu/cpu51/cpufreq/cpuinfo_max_freq:6646000 /sys/devices/system/cpu/cpu52/cpufreq/cpuinfo_max_freq:5837000 /sys/devices/system/cpu/cpu53/cpufreq/cpuinfo_max_freq:6482000 /sys/devices/system/cpu/cpu54/cpufreq/cpuinfo_max_freq:5517000 /sys/devices/system/cpu/cpu55/cpufreq/cpuinfo_max_freq:5677000 /sys/devices/system/cpu/cpu56/cpufreq/cpuinfo_max_freq:6966000 /sys/devices/system/cpu/cpu57/cpufreq/cpuinfo_max_freq:7775000 /sys/devices/system/cpu/cpu58/cpufreq/cpuinfo_max_freq:6806000 /sys/devices/system/cpu/cpu59/cpufreq/cpuinfo_max_freq:7775000 /sys/devices/system/cpu/cpu60/cpufreq/cpuinfo_max_freq:7130000 /sys/devices/system/cpu/cpu61/cpufreq/cpuinfo_max_freq:7451000 /sys/devices/system/cpu/cpu62/cpufreq/cpuinfo_max_freq:7290000 /sys/devices/system/cpu/cpu63/cpufreq/cpuinfo_max_freq:7611000 Thanks, -Andrea
On 8/29/2024 07:52, Andrea Righi wrote: > On Wed, Aug 28, 2024 at 08:27:44PM +0530, Gautham R. Shenoy wrote: >> Hello Andrea, >> >> On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote: >>> On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote: >>> ... >>>>> I had thought this was a malfunction in the behavior that it reflected the >>>>> current status, not the hardware /capability/. >>>>> >>>>> Which one makes more sense for userspace? In my mind the most likely >>>>> consumer of this information would be something a sched_ext based userspace >>>>> scheduler. They would need to know whether the scheduler was using >>>>> preferred cores; not whether the hardware supported it. >>>> >>>> The commandline parameter currently impacts only the fair sched-class >>>> tasks since the preference information gets used only during >>>> load-balancing. >>>> >>>> IMO, the same should continue with sched-ext, i.e. if the user has >>>> explicitly disabled prefcore support via commandline, the no sched-ext >>>> scheduler should use the preference information to make task placement >>>> decisions. However, I would like to see what the sched-ext folks have >>>> to say. Adding some of them to the Cc list. >>> >>> IMHO it makes more sense to reflect the real state of prefcore support >>> from a "system" perspective, more than a "hardware" perspective, so if >>> it's disabled via boot command line it should show disabled. >>> >>> From a user-space scheduler perspective we should be fine either way, as >>> long as the ABI is clearly documented, since we also have access to >>> /proc/cmdline and we would be able to figure out if the user has >>> disabled it via cmdline (however, the preference is still to report the >>> actual system status). >> >> Thank you for confirming this. >> >>> >>> Question: having prefcore enabled affects also the value of >>> scaling_max_freq? Like an `lscpu -e`, for example, would show a higher >>> max frequency for the specific preferred cores? (this is another useful >>> information from a sched_ext scheduler perspective). >> >> Since the scaling_max_freq is computed based on the boost-numerator, >> at least from this patchset, the numerator would be the same across >> all kinds of cores, and thus the scaling_max_freq reported will be the >> same across all the cores. > > I see, so IIUC from user-space the most reliable way to detect the > fastest cores is to check amd_pstate_highest_perf / amd_pstate_max_freq, > right? I'm trying to figure out a way to abstract and generalize the > concept of "fast cores" in sched_ext. Right now the best way to do this is to look at the amd_pstate_precore_ranking file. In this series there has been some discussion of dropping it though in favor of looking at the highest perf file. I don't believe we're concluded one way or another on it yet though. > > Also, is this something that has changed recently? I see this on an > AMD Ryzen Threadripper PRO 7975WX 32-Cores running a 6.8 kernel: > > $ uname -r > 6.8.0-40-generic You're missing the preferred core patches on this kernel. They landed in 6.9, it's better to upgrade to 6.10.y or 6.11-rc.
On Thu, Aug 29, 2024 at 08:01:48AM -0500, Mario Limonciello wrote: > On 8/29/2024 07:52, Andrea Righi wrote: > > On Wed, Aug 28, 2024 at 08:27:44PM +0530, Gautham R. Shenoy wrote: > > > Hello Andrea, > > > > > > On Wed, Aug 28, 2024 at 08:20:50AM +0200, Andrea Righi wrote: > > > > On Wed, Aug 28, 2024 at 10:38:45AM +0530, Gautham R. Shenoy wrote: > > > > ... > > > > > > I had thought this was a malfunction in the behavior that it reflected the > > > > > > current status, not the hardware /capability/. > > > > > > > > > > > > Which one makes more sense for userspace? In my mind the most likely > > > > > > consumer of this information would be something a sched_ext based userspace > > > > > > scheduler. They would need to know whether the scheduler was using > > > > > > preferred cores; not whether the hardware supported it. > > > > > > > > > > The commandline parameter currently impacts only the fair sched-class > > > > > tasks since the preference information gets used only during > > > > > load-balancing. > > > > > > > > > > IMO, the same should continue with sched-ext, i.e. if the user has > > > > > explicitly disabled prefcore support via commandline, the no sched-ext > > > > > scheduler should use the preference information to make task placement > > > > > decisions. However, I would like to see what the sched-ext folks have > > > > > to say. Adding some of them to the Cc list. > > > > > > > > IMHO it makes more sense to reflect the real state of prefcore support > > > > from a "system" perspective, more than a "hardware" perspective, so if > > > > it's disabled via boot command line it should show disabled. > > > > > > > > From a user-space scheduler perspective we should be fine either way, as > > > > long as the ABI is clearly documented, since we also have access to > > > > /proc/cmdline and we would be able to figure out if the user has > > > > disabled it via cmdline (however, the preference is still to report the > > > > actual system status). > > > > > > Thank you for confirming this. > > > > > > > > > > > Question: having prefcore enabled affects also the value of > > > > scaling_max_freq? Like an `lscpu -e`, for example, would show a higher > > > > max frequency for the specific preferred cores? (this is another useful > > > > information from a sched_ext scheduler perspective). > > > > > > Since the scaling_max_freq is computed based on the boost-numerator, > > > at least from this patchset, the numerator would be the same across > > > all kinds of cores, and thus the scaling_max_freq reported will be the > > > same across all the cores. > > > > I see, so IIUC from user-space the most reliable way to detect the > > fastest cores is to check amd_pstate_highest_perf / amd_pstate_max_freq, > > right? I'm trying to figure out a way to abstract and generalize the > > concept of "fast cores" in sched_ext. > > Right now the best way to do this is to look at the > amd_pstate_precore_ranking file. Ok. > > In this series there has been some discussion of dropping it though in favor > of looking at the highest perf file. I don't believe we're concluded one > way or another on it yet though. > > > > > Also, is this something that has changed recently? I see this on an > > AMD Ryzen Threadripper PRO 7975WX 32-Cores running a 6.8 kernel: > > > > $ uname -r > > 6.8.0-40-generic > > You're missing the preferred core patches on this kernel. They landed in > 6.9, it's better to upgrade to 6.10.y or 6.11-rc. So, if I move to 6.9+ I should see the same max frequency across all the CPUs and I can use amd_pstate_precore_ranking to determine the subset of fast cores. Thanks for the clarification. -Andrea
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index ed05d7a0add10..257e28e549bd1 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1112,12 +1112,7 @@ static ssize_t show_amd_pstate_prefcore_ranking(struct cpufreq_policy *policy, static ssize_t show_amd_pstate_hw_prefcore(struct cpufreq_policy *policy, char *buf) { - bool hw_prefcore; - struct amd_cpudata *cpudata = policy->driver_data; - - hw_prefcore = READ_ONCE(cpudata->hw_prefcore); - - return sysfs_emit(buf, "%s\n", str_enabled_disabled(hw_prefcore)); + return sysfs_emit(buf, "%s\n", str_enabled_disabled(amd_pstate_prefcore)); } static ssize_t show_energy_performance_available_preferences(