diff mbox series

[v4,4/4] cpufreq: Use arch specific feedback for cpuinfo_cur_freq

Message ID 20240405133319.859813-5-beata.michalska@arm.com (mailing list archive)
State New, archived
Headers show
Series Add support for AArch64 AMUv1-based arch_freq_get_on_cpu | expand

Commit Message

Beata Michalska April 5, 2024, 1:33 p.m. UTC
Some architectures provide a way to determine an average frequency over
a certain period of time based on available performance monitors (AMU on
ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
represent the current frequency of a given CPU, as obtained by the hardware.
This is the type of feedback that counters do provide.

Signed-off-by: Beata Michalska <beata.michalska@arm.com>
---
 drivers/cpufreq/cpufreq.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Vanshidhar Konda April 16, 2024, 4:23 a.m. UTC | #1
On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
>Some architectures provide a way to determine an average frequency over
>a certain period of time based on available performance monitors (AMU on
>ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
>into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
>represent the current frequency of a given CPU, as obtained by the hardware.
>This is the type of feedback that counters do provide.
>

--- snip ---

While testing this patch series on AmpereOne system, I simulated CPU
frequency throttling when the system is under power or thermal
constraints.

In this scenario, based on the user guilde, I expect scaling_cur_freq
is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
is the actual frequency that the hardware is able to run at during the
power or thermal constraints.

The AmpereOne system I'm testing on has the following configuration:
- Max frequency is 3000000
- Support for AMU registers
- ACPI CPPC feedback counters use PCC register space
- Fedora 39 with 6.7.5 kernel
- Fedora 39 with 6.9.0-rc3 + this patch series

With 6.7.5 kernel:
Core        scaling_cur_freq        cpuinfo_cur_freq
----        ----------------        ----------------
0             3000000                 2593000
1             3000000                 2613000
2             3000000                 2625000
3             3000000                 2632000

With 6.9.0-rc3 + this patch series:
Core        scaling_cur_freq        cpuinfo_cur_freq
----        ----------------        ----------------
0             2671875                 2671875
1             2589632                 2589632
2             2648437                 2648437
3             2698242                 2698242

In the second case we can't identify that the CPU frequency is
being throttled by the hardware. I noticed this behavior with
or without this patch.

Thanks,
Vanshi
Beata Michalska April 16, 2024, 3:46 p.m. UTC | #2
On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote:
> On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
> > Some architectures provide a way to determine an average frequency over
> > a certain period of time based on available performance monitors (AMU on
> > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
> > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
> > represent the current frequency of a given CPU, as obtained by the hardware.
> > This is the type of feedback that counters do provide.
> > 
> 
> --- snip ---
> 
> While testing this patch series on AmpereOne system, I simulated CPU
> frequency throttling when the system is under power or thermal
> constraints.
> 
> In this scenario, based on the user guilde, I expect scaling_cur_freq
> is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
> is the actual frequency that the hardware is able to run at during the
> power or thermal constraints.
There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1].
The guidelines you are referring here (assuming you mean [2]) are kinda
out-of-sync already as scaling_cur_freq has been wired earlier to use arch
specific feedback. As there was no Arm dedicated implementation of
arch_freq_get_on_cpu, this went kinda unnoticed.
The conclusion of the above mentioned discussion (though rather unstated
explicitly) was to keep the current behaviour of scaling_cur_freq and align
both across different archs: so with the patches, both attributes will provide
hw feedback on current frequency, when available.
Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use
the counters really.

That change was also requested through [3]

Adding @Viresh in case there was any shift in the tides ....
> 
> The AmpereOne system I'm testing on has the following configuration:
> - Max frequency is 3000000
> - Support for AMU registers
> - ACPI CPPC feedback counters use PCC register space
> - Fedora 39 with 6.7.5 kernel
> - Fedora 39 with 6.9.0-rc3 + this patch series
> 
> With 6.7.5 kernel:
> Core        scaling_cur_freq        cpuinfo_cur_freq
> ----        ----------------        ----------------
> 0             3000000                 2593000
> 1             3000000                 2613000
> 2             3000000                 2625000
> 3             3000000                 2632000
> 
So if I got it right from the info you have provided the numbers above are
obtained without applying the patches. In that case, scaling_cur_freq will
use policy->cur (in your case) showing last frequency set, not necessarily
the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters.


> With 6.9.0-rc3 + this patch series:
> Core        scaling_cur_freq        cpuinfo_cur_freq
> ----        ----------------        ----------------
> 0             2671875                 2671875
> 1             2589632                 2589632
> 2             2648437                 2648437
> 3             2698242                 2698242
>
With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU
counters, or fie scale factor obtained based on AMU counters to be more precise:
both should now show similar/same frequency (as discussed in [1])
I'd say due to existing implementation for scaling_cur_freq (which we cannot
change at this point) this is unavoidable.

> In the second case we can't identify that the CPU frequency is
> being throttled by the hardware. I noticed this behavior with
> or without this patch.
>
I am not entirely sure comparing the two should be a way to go about throttling
(whether w/ or w/o the changes).
It would probably be best to refer to thermal sysfs and get a hold of cur_state
which should indicate current throttle state:

 /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state

with values above '0' implying ongoing throttling.

The appropriate thermal_zone can be identified through 'type' attribute.


Thank you for giving those patches a spin.

---
BR
Beata
---
[1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/
[2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197
[3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/
---


> Thanks,
> Vanshi
Vanshidhar Konda April 17, 2024, 9:38 p.m. UTC | #3
On Tue, Apr 16, 2024 at 05:46:18PM +0200, Beata Michalska wrote:
>On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote:
>> On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
>> > Some architectures provide a way to determine an average frequency over
>> > a certain period of time based on available performance monitors (AMU on
>> > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
>> > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
>> > represent the current frequency of a given CPU, as obtained by the hardware.
>> > This is the type of feedback that counters do provide.
>> >
>>
>> --- snip ---
>>
>> While testing this patch series on AmpereOne system, I simulated CPU
>> frequency throttling when the system is under power or thermal
>> constraints.
>>
>> In this scenario, based on the user guilde, I expect scaling_cur_freq
>> is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
>> is the actual frequency that the hardware is able to run at during the
>> power or thermal constraints.
>There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1].
>The guidelines you are referring here (assuming you mean [2]) are kinda
>out-of-sync already as scaling_cur_freq has been wired earlier to use arch
>specific feedback. As there was no Arm dedicated implementation of
>arch_freq_get_on_cpu, this went kinda unnoticed.
>The conclusion of the above mentioned discussion (though rather unstated
>explicitly) was to keep the current behaviour of scaling_cur_freq and align
>both across different archs: so with the patches, both attributes will provide
>hw feedback on current frequency, when available.
>Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use
>the counters really.
>
>That change was also requested through [3]
>
>Adding @Viresh in case there was any shift in the tides ....
>

Thank you for the pointer to the discussion in [1]. It makes sense to
bring arm64 behavior in line with x86. The question about whether
modifying the behavior of scaling_cur_freq was a good idea did not get
any response.

>>
>> The AmpereOne system I'm testing on has the following configuration:
>> - Max frequency is 3000000
>> - Support for AMU registers
>> - ACPI CPPC feedback counters use PCC register space
>> - Fedora 39 with 6.7.5 kernel
>> - Fedora 39 with 6.9.0-rc3 + this patch series
>>
>> With 6.7.5 kernel:
>> Core        scaling_cur_freq        cpuinfo_cur_freq
>> ----        ----------------        ----------------
>> 0             3000000                 2593000
>> 1             3000000                 2613000
>> 2             3000000                 2625000
>> 3             3000000                 2632000
>>
>So if I got it right from the info you have provided the numbers above are
>obtained without applying the patches. In that case, scaling_cur_freq will
>use policy->cur (in your case) showing last frequency set, not necessarily
>the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters.
>
>
>> With 6.9.0-rc3 + this patch series:
>> Core        scaling_cur_freq        cpuinfo_cur_freq
>> ----        ----------------        ----------------
>> 0             2671875                 2671875
>> 1             2589632                 2589632
>> 2             2648437                 2648437
>> 3             2698242                 2698242
>>
>With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU
>counters, or fie scale factor obtained based on AMU counters to be more precise:
>both should now show similar/same frequency (as discussed in [1])
>I'd say due to existing implementation for scaling_cur_freq (which we cannot
>change at this point) this is unavoidable.
>
>> In the second case we can't identify that the CPU frequency is
>> being throttled by the hardware. I noticed this behavior with
>> or without this patch.
>>
>I am not entirely sure comparing the two should be a way to go about throttling
>(whether w/ or w/o the changes).
>It would probably be best to refer to thermal sysfs and get a hold of cur_state

Throttling could happen due to non-thermal reasons. Or a system may not
even support thermal zones. So on those systems we wouldn't be able to
identify/debug the behavior of the hardware providing less than maximum
performance. The discussion around scaling_cur_freq should probably be
re-visited in a targeted manner I think.

I'll test v5 of the series on AmpereOne and report back on that thread.

Thanks,
Vanshi

>which should indicate current throttle state:
>
> /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state
>
>with values above '0' implying ongoing throttling.
>
>The appropriate thermal_zone can be identified through 'type' attribute.
>
>
>Thank you for giving those patches a spin.
>
>---
>BR
>Beata
>---
>[1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/
>[2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197
>[3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/
>---
>
>
>> Thanks,
>> Vanshi
Beata Michalska April 26, 2024, 10:45 a.m. UTC | #4
On Wed, Apr 17, 2024 at 02:38:58PM -0700, Vanshidhar Konda wrote:
> On Tue, Apr 16, 2024 at 05:46:18PM +0200, Beata Michalska wrote:
> > On Mon, Apr 15, 2024 at 09:23:10PM -0700, Vanshidhar Konda wrote:
> > > On Fri, Apr 05, 2024 at 02:33:19PM +0100, Beata Michalska wrote:
> > > > Some architectures provide a way to determine an average frequency over
> > > > a certain period of time based on available performance monitors (AMU on
> > > > ARM or APERF/MPERf on x86). With those at hand, enroll arch_freq_get_on_cpu
> > > > into cpuinfo_cur_freq policy sysfs attribute handler, which is expected to
> > > > represent the current frequency of a given CPU, as obtained by the hardware.
> > > > This is the type of feedback that counters do provide.
> > > >
> > > 
> > > --- snip ---
> > > 
> > > While testing this patch series on AmpereOne system, I simulated CPU
> > > frequency throttling when the system is under power or thermal
> > > constraints.
> > > 
> > > In this scenario, based on the user guilde, I expect scaling_cur_freq
> > > is the frequency the kernel requests from the hardware; cpuinfo_cur_freq
> > > is the actual frequency that the hardware is able to run at during the
> > > power or thermal constraints.
> > There has been a discussion on scaling_cur_freq vs cpuinfo_cur_freq [1].
> > The guidelines you are referring here (assuming you mean [2]) are kinda
> > out-of-sync already as scaling_cur_freq has been wired earlier to use arch
> > specific feedback. As there was no Arm dedicated implementation of
> > arch_freq_get_on_cpu, this went kinda unnoticed.
> > The conclusion of the above mentioned discussion (though rather unstated
> > explicitly) was to keep the current behaviour of scaling_cur_freq and align
> > both across different archs: so with the patches, both attributes will provide
> > hw feedback on current frequency, when available.
> > Note that if we are to adhere to the docs cpuinfo_cur_freq is the place to use
> > the counters really.
> > 
> > That change was also requested through [3]
> > 
> > Adding @Viresh in case there was any shift in the tides ....
> > 
> 
> Thank you for the pointer to the discussion in [1]. It makes sense to
> bring arm64 behavior in line with x86. The question about whether
> modifying the behavior of scaling_cur_freq was a good idea did not get
> any response.
> 
> > > 
> > > The AmpereOne system I'm testing on has the following configuration:
> > > - Max frequency is 3000000
> > > - Support for AMU registers
> > > - ACPI CPPC feedback counters use PCC register space
> > > - Fedora 39 with 6.7.5 kernel
> > > - Fedora 39 with 6.9.0-rc3 + this patch series
> > > 
> > > With 6.7.5 kernel:
> > > Core        scaling_cur_freq        cpuinfo_cur_freq
> > > ----        ----------------        ----------------
> > > 0             3000000                 2593000
> > > 1             3000000                 2613000
> > > 2             3000000                 2625000
> > > 3             3000000                 2632000
> > > 
> > So if I got it right from the info you have provided the numbers above are
> > obtained without applying the patches. In that case, scaling_cur_freq will
> > use policy->cur (in your case) showing last frequency set, not necessarily
> > the actual freq, whereas cpuinfo_cur_freq uses __cpufreq_get and AMU counters.
> > 
> > 
> > > With 6.9.0-rc3 + this patch series:
> > > Core        scaling_cur_freq        cpuinfo_cur_freq
> > > ----        ----------------        ----------------
> > > 0             2671875                 2671875
> > > 1             2589632                 2589632
> > > 2             2648437                 2648437
> > > 3             2698242                 2698242
> > > 
> > With the patches applied both scaling_cur_freq and cpuinfo_cur_freq will use AMU
> > counters, or fie scale factor obtained based on AMU counters to be more precise:
> > both should now show similar/same frequency (as discussed in [1])
> > I'd say due to existing implementation for scaling_cur_freq (which we cannot
> > change at this point) this is unavoidable.
> > 
> > > In the second case we can't identify that the CPU frequency is
> > > being throttled by the hardware. I noticed this behavior with
> > > or without this patch.
> > > 
> > I am not entirely sure comparing the two should be a way to go about throttling
> > (whether w/ or w/o the changes).
> > It would probably be best to refer to thermal sysfs and get a hold of cur_state
> 
> Throttling could happen due to non-thermal reasons. Or a system may not
> even support thermal zones. So on those systems we wouldn't be able to
> identify/debug the behavior of the hardware providing less than maximum
> performance. The discussion around scaling_cur_freq should probably be
> re-visited in a targeted manner I think.
> 

@Viresh:

It seems that we might need to revisit the discussion we've had around
scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
As Vanshi has raised, having both utilizing arch specific feedback for
getting current frequency is bit problematic and might be confusing at best.
As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
many options we are left with, if we want to kee all archs aligned:
we can either try to rework show_scaling_cur_freq and it's use of
arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
relevant docs, though that will not work for x86, or we keep it only there and
skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
the info requested (hw vs sw) or adding yet another arch-specific implementation,
and those are not really appealing alternatives to say at least.
What's your opinion on this one ?

---
BR
Beata


> I'll test v5 of the series on AmpereOne and report back on that thread.
> 
> Thanks,
> Vanshi
> 
> > which should indicate current throttle state:
> > 
> > /sys/class/thermal/thermal_zone[0-*]/cdev[0-*]/cur_state
> > 
> > with values above '0' implying ongoing throttling.
> > 
> > The appropriate thermal_zone can be identified through 'type' attribute.
> > 
> > 
> > Thank you for giving those patches a spin.
> > 
> > ---
> > BR
> > Beata
> > ---
> > [1] https://lore.kernel.org/all/20230609043922.eyyqutbwlofqaddz@vireshk-i7/
> > [2] https://elixir.bootlin.com/linux/latest/source/Documentation/admin-guide/pm/cpufreq.rst#L197
> > [3] https://lore.kernel.org/lkml/2cfbc633-1e94-d741-2337-e1b0cf48b81b@nvidia.com/
> > ---
> > 
> > 
> > > Thanks,
> > > Vanshi
Viresh Kumar April 29, 2024, 9:25 a.m. UTC | #5
On 26-04-24, 12:45, Beata Michalska wrote:
> It seems that we might need to revisit the discussion we've had around
> scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
> As Vanshi has raised, having both utilizing arch specific feedback for
> getting current frequency is bit problematic and might be confusing at best.
> As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
> many options we are left with, if we want to kee all archs aligned:
> we can either try to rework show_scaling_cur_freq and it's use of
> arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
> relevant docs, though that will not work for x86, or we keep it only there and
> skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
> purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
> the info requested (hw vs sw) or adding yet another arch-specific implementation,
> and those are not really appealing alternatives to say at least.
> What's your opinion on this one ?

Hi Beata / Vanshidhar,

Lets forget for once what X86 and ARM may have done and think about it
once again. I also had a chat with Vincent today about this.

The documentation says it clearly, cpuinfo_cur_freq is the one
received from hardware and scaling_cur_freq is the one requested from
software.

Now, I know that X86 has made both of them quite similar and I
suggested to make them all aligned (and never received a reply on my
previous message).

There are few reasons why it may be worth keeping the definition (and
behavior) of the sysfs files as is, at least for ARM:
- First is that the documentation says so.
- There is no point providing the same information via both the
  interfaces, there are two interfaces here for a reason.
- There maybe tools around which depend on the documented behavior.
- From userspace, currently there is only one way to know the exact
  frequency that the cpufreq governors have requested from a platform,
  i.e. the value from scaling_cur_freq. If we make it similar to
  cpuinfo_cur_freq, then userspace will never know about the requested
  frequency and the eventual one and if they are same or different.

Lets keep the behavior as is and update only cpuinfo_cur_freq with
arch_freq_get_on_cpu().

Makes sense ?
Vanshidhar Konda May 1, 2024, 2:46 p.m. UTC | #6
On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote:
>On 26-04-24, 12:45, Beata Michalska wrote:
>> It seems that we might need to revisit the discussion we've had around
>> scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
>> As Vanshi has raised, having both utilizing arch specific feedback for
>> getting current frequency is bit problematic and might be confusing at best.
>> As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
>> many options we are left with, if we want to kee all archs aligned:
>> we can either try to rework show_scaling_cur_freq and it's use of
>> arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
>> relevant docs, though that will not work for x86, or we keep it only there and
>> skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
>> purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
>> the info requested (hw vs sw) or adding yet another arch-specific implementation,
>> and those are not really appealing alternatives to say at least.
>> What's your opinion on this one ?
>
>Hi Beata / Vanshidhar,
>
>Lets forget for once what X86 and ARM may have done and think about it
>once again. I also had a chat with Vincent today about this.
>
>The documentation says it clearly, cpuinfo_cur_freq is the one
>received from hardware and scaling_cur_freq is the one requested from
>software.
>
>Now, I know that X86 has made both of them quite similar and I
>suggested to make them all aligned (and never received a reply on my
>previous message).
>
>There are few reasons why it may be worth keeping the definition (and
>behavior) of the sysfs files as is, at least for ARM:
>- First is that the documentation says so.
>- There is no point providing the same information via both the
>  interfaces, there are two interfaces here for a reason.
>- There maybe tools around which depend on the documented behavior.
>- From userspace, currently there is only one way to know the exact
>  frequency that the cpufreq governors have requested from a platform,
>  i.e. the value from scaling_cur_freq. If we make it similar to
>  cpuinfo_cur_freq, then userspace will never know about the requested
>  frequency and the eventual one and if they are same or different.
>
>Lets keep the behavior as is and update only cpuinfo_cur_freq with
>arch_freq_get_on_cpu().
>
>Makes sense ?

I had the same concerns. It probably makes sense explicity note this in
the next version of the patch series; in the future readers may be
confused by x86 and ARM behave differntly on scaling_cur_freq.

Thanks,
Vanshi

>
>-- 
>viresh
Beata Michalska May 7, 2024, 8:31 a.m. UTC | #7
On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote:
> On 26-04-24, 12:45, Beata Michalska wrote:
> > It seems that we might need to revisit the discussion we've had around
> > scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
> > As Vanshi has raised, having both utilizing arch specific feedback for
> > getting current frequency is bit problematic and might be confusing at best.
> > As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
> > many options we are left with, if we want to kee all archs aligned:
> > we can either try to rework show_scaling_cur_freq and it's use of
> > arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
> > relevant docs, though that will not work for x86, or we keep it only there and
> > skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
> > purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
> > the info requested (hw vs sw) or adding yet another arch-specific implementation,
> > and those are not really appealing alternatives to say at least.
> > What's your opinion on this one ?
> 
> Hi Beata / Vanshidhar,
> 
> Lets forget for once what X86 and ARM may have done and think about it
> once again. I also had a chat with Vincent today about this.
> 
> The documentation says it clearly, cpuinfo_cur_freq is the one
> received from hardware and scaling_cur_freq is the one requested from
> software.
> 
> Now, I know that X86 has made both of them quite similar and I
> suggested to make them all aligned (and never received a reply on my
> previous message).
> 
> There are few reasons why it may be worth keeping the definition (and
> behavior) of the sysfs files as is, at least for ARM:
> - First is that the documentation says so.
> - There is no point providing the same information via both the
>   interfaces, there are two interfaces here for a reason.
> - There maybe tools around which depend on the documented behavior.
> - From userspace, currently there is only one way to know the exact
>   frequency that the cpufreq governors have requested from a platform,
>   i.e. the value from scaling_cur_freq. If we make it similar to
>   cpuinfo_cur_freq, then userspace will never know about the requested
>   frequency and the eventual one and if they are same or different.
> 
> Lets keep the behavior as is and update only cpuinfo_cur_freq with
> arch_freq_get_on_cpu().
> 
> Makes sense ?
>
First of all - apologies for late reply.

It all makes sense, though to clarify things up, for my own benefit, and to
avoid any potential confusion ....

Adding arch_freq_get_on_cpu to cpuinfo_cur_freq does seem to be the right
approach - no argue on this one. Doing that though means we need a way to
skip calling arch_freq_get_on_cpu() from show_scaling_cur_freq(), to avoid
having both providing the same information when that should not be the case.
In the initial approach [1], that was handled by checking whether the cpufreq
driver supports 'get' callback (and thus cpuinfo_cur_freq). In case it didn't,
things remained unchanged for scaling_cur_freq. That does not seem to be a viable
option though, as there are at least few drivers, that will support both:
cpuinfo_cur_freq alongside scaling_cur_freq (+ APERF/MPERF) and for those,
we would hit the initial problem of both relying on arch_freq_get_on_cpu.
So I guess we need another way of avoiding calling arch_freq_get_on_cpu
for show_scaling_cur_freq (and most probably avoid calling that one for
cpuinfo_cur_freq). Quick idea on how to not bring arch specificity into
cpufreq generic code would be to introduce a new flag for cpufreq drivers though
that seems a bit stretched. Will ponder a bit about that but in the meantime
suggestions are more than welcomed.

Aside: I will most probably send the changes separately from this series to not
mix things any further.

---
[1] https://lore.kernel.org/all/20230606155754.245998-1-beata.michalska@arm.com/
---
BR
Beata


> -- 
> viresh
Beata Michalska May 7, 2024, 10:02 a.m. UTC | #8
On Tue, May 07, 2024 at 10:31:52AM +0200, Beata Michalska wrote:
> On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote:
> > On 26-04-24, 12:45, Beata Michalska wrote:
> > > It seems that we might need to revisit the discussion we've had around
> > > scaling_cur_freq and cpuinfo_cur_freq and the use of arch_freq_get_on_cpu.
> > > As Vanshi has raised, having both utilizing arch specific feedback for
> > > getting current frequency is bit problematic and might be confusing at best.
> > > As arch_freq_get_on_cpu is already used by show_scaling_cur_freq there are not
> > > many options we are left with, if we want to kee all archs aligned:
> > > we can either try to rework show_scaling_cur_freq and it's use of
> > > arch_freq_get_on_cpu, and move it to cpuinfo_cur_freq, which would align with
> > > relevant docs, though that will not work for x86, or we keep it only there and
> > > skip updating cpuinfo_cur_freq, going against the guidelines. Other options,
> > > purely theoretical, would involve making arch_freq_get_on_cpu aware of type of
> > > the info requested (hw vs sw) or adding yet another arch-specific implementation,
> > > and those are not really appealing alternatives to say at least.
> > > What's your opinion on this one ?
> > 
> > Hi Beata / Vanshidhar,
> > 
> > Lets forget for once what X86 and ARM may have done and think about it
> > once again. I also had a chat with Vincent today about this.
> > 
> > The documentation says it clearly, cpuinfo_cur_freq is the one
> > received from hardware and scaling_cur_freq is the one requested from
> > software.
> > 
> > Now, I know that X86 has made both of them quite similar and I
> > suggested to make them all aligned (and never received a reply on my
> > previous message).
> > 
> > There are few reasons why it may be worth keeping the definition (and
> > behavior) of the sysfs files as is, at least for ARM:
> > - First is that the documentation says so.
> > - There is no point providing the same information via both the
> >   interfaces, there are two interfaces here for a reason.
> > - There maybe tools around which depend on the documented behavior.
> > - From userspace, currently there is only one way to know the exact
> >   frequency that the cpufreq governors have requested from a platform,
> >   i.e. the value from scaling_cur_freq. If we make it similar to
> >   cpuinfo_cur_freq, then userspace will never know about the requested
> >   frequency and the eventual one and if they are same or different.
> > 
> > Lets keep the behavior as is and update only cpuinfo_cur_freq with
> > arch_freq_get_on_cpu().
> > 
> > Makes sense ?
> >
> First of all - apologies for late reply.
> 
> It all makes sense, though to clarify things up, for my own benefit, and to
> avoid any potential confusion ....
> 
> Adding arch_freq_get_on_cpu to cpuinfo_cur_freq does seem to be the right
> approach - no argue on this one. Doing that though means we need a way to
> skip calling arch_freq_get_on_cpu() from show_scaling_cur_freq(), to avoid
> having both providing the same information when that should not be the case.
> In the initial approach [1], that was handled by checking whether the cpufreq
> driver supports 'get' callback (and thus cpuinfo_cur_freq). In case it didn't,
> things remained unchanged for scaling_cur_freq. That does not seem to be a viable
> option though, as there are at least few drivers, that will support both:
> cpuinfo_cur_freq alongside scaling_cur_freq (+ APERF/MPERF) and for those,
> we would hit the initial problem of both relying on arch_freq_get_on_cpu.
> So I guess we need another way of avoiding calling arch_freq_get_on_cpu
> for show_scaling_cur_freq (and most probably avoid calling that one for
> cpuinfo_cur_freq). Quick idea on how to not bring arch specificity into
> cpufreq generic code would be to introduce a new flag for cpufreq drivers though
> that seems a bit stretched. Will ponder a bit about that but in the meantime
> suggestions are more than welcomed.
Alternatively we could add a parameter to arch_freq_get_on_cpu specifying type
of feedback required: hw vs sw. Then the arch specific implementation could
decide which to provide when. It will get slightly counter-intuitive, especially
for cases when sw feedback provides hw one, as it is the case for current
arch_freq_get_on_cpu() and scaling_cur_freq but at least the changes would be
minimal and it will contain handling the tricky bits inside arch specific
implementation - hiding those messy bits.

---
BR
Beata
> 
> Aside: I will most probably send the changes separately from this series to not
> mix things any further.
> 
> ---
> [1] https://lore.kernel.org/all/20230606155754.245998-1-beata.michalska@arm.com/
> ---
> BR
> Beata
> 
> 
> > -- 
> > viresh
Viresh Kumar May 20, 2024, 9:18 a.m. UTC | #9
On 07-05-24, 12:02, Beata Michalska wrote:
> On Tue, May 07, 2024 at 10:31:52AM +0200, Beata Michalska wrote:
> > On Mon, Apr 29, 2024 at 02:55:15PM +0530, Viresh Kumar wrote:
> > > Lets forget for once what X86 and ARM may have done and think about it
> > > once again. I also had a chat with Vincent today about this.
> > > 
> > > The documentation says it clearly, cpuinfo_cur_freq is the one
> > > received from hardware and scaling_cur_freq is the one requested from
> > > software.
> > > 
> > > Now, I know that X86 has made both of them quite similar and I
> > > suggested to make them all aligned (and never received a reply on my
> > > previous message).
> > > 
> > > There are few reasons why it may be worth keeping the definition (and
> > > behavior) of the sysfs files as is, at least for ARM:
> > > - First is that the documentation says so.
> > > - There is no point providing the same information via both the
> > >   interfaces, there are two interfaces here for a reason.
> > > - There maybe tools around which depend on the documented behavior.
> > > - From userspace, currently there is only one way to know the exact
> > >   frequency that the cpufreq governors have requested from a platform,
> > >   i.e. the value from scaling_cur_freq. If we make it similar to
> > >   cpuinfo_cur_freq, then userspace will never know about the requested
> > >   frequency and the eventual one and if they are same or different.
> > > 
> > > Lets keep the behavior as is and update only cpuinfo_cur_freq with
> > > arch_freq_get_on_cpu().
> > > 
> > > Makes sense ?
> > >
> > First of all - apologies for late reply.
> > 
> > It all makes sense, though to clarify things up, for my own benefit, and to
> > avoid any potential confusion ....
> > 
> > Adding arch_freq_get_on_cpu to cpuinfo_cur_freq does seem to be the right
> > approach - no argue on this one. Doing that though means we need a way to
> > skip calling arch_freq_get_on_cpu() from show_scaling_cur_freq(), to avoid
> > having both providing the same information when that should not be the case.
> > In the initial approach [1], that was handled by checking whether the cpufreq
> > driver supports 'get' callback (and thus cpuinfo_cur_freq). In case it didn't,
> > things remained unchanged for scaling_cur_freq. That does not seem to be a viable
> > option though, as there are at least few drivers, that will support both:
> > cpuinfo_cur_freq alongside scaling_cur_freq (+ APERF/MPERF) and for those,
> > we would hit the initial problem of both relying on arch_freq_get_on_cpu.
> > So I guess we need another way of avoiding calling arch_freq_get_on_cpu
> > for show_scaling_cur_freq (and most probably avoid calling that one for
> > cpuinfo_cur_freq). Quick idea on how to not bring arch specificity into
> > cpufreq generic code would be to introduce a new flag for cpufreq drivers though
> > that seems a bit stretched. Will ponder a bit about that but in the meantime
> > suggestions are more than welcomed.
> Alternatively we could add a parameter to arch_freq_get_on_cpu specifying type
> of feedback required: hw vs sw. Then the arch specific implementation could
> decide which to provide when. It will get slightly counter-intuitive, especially
> for cases when sw feedback provides hw one, as it is the case for current
> arch_freq_get_on_cpu() and scaling_cur_freq but at least the changes would be
> minimal and it will contain handling the tricky bits inside arch specific
> implementation - hiding those messy bits.

I think we should just move the call to arch_freq_get_on_cpu() from
show_scaling_cur_freq() to show_cpuinfo_cur_freq() and post it.

Lets see what X86 guys say to that. You can provide all the reasoning
we discussed here, which makes perfect sense.
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 66e10a19d76a..603533b2608f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -795,8 +795,10 @@  store_one(scaling_max_freq, max);
 static ssize_t show_cpuinfo_cur_freq(struct cpufreq_policy *policy,
 					char *buf)
 {
-	unsigned int cur_freq = __cpufreq_get(policy);
+	unsigned int cur_freq = arch_freq_get_on_cpu(policy->cpu);
 
+	if (!cur_freq)
+		cur_freq = __cpufreq_get(policy);
 	if (cur_freq)
 		return sprintf(buf, "%u\n", cur_freq);