mbox series

[PATCHv3,0/6] CPPC optional registers AMD support

Message ID cover.1562781484.git.Janakarajan.Natarajan@amd.com (mailing list archive)
Headers show
Series CPPC optional registers AMD support | expand

Message

Janakarajan Natarajan July 10, 2019, 6:37 p.m. UTC
CPPC (Collaborative Processor Performance Control) offers optional
registers which can be used to tune the system based on energy and/or
performance requirements.

Newer AMD processors (>= Family 17h) add support for a subset of these
optional CPPC registers, based on ACPI v6.1.

The following are the supported CPPC registers for which sysfs entries
are created:
* enable                (NEW)
* max_perf              (NEW)
* min_perf              (NEW)
* energy_perf
* lowest_perf
* nominal_perf
* desired_perf          (NEW)
* feedback_ctrs
* auto_sel_enable       (NEW)
* lowest_nonlinear_perf

First, update cppc_acpi to create sysfs entries only when the optional
registers are known to be supported.

Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
to access the newly supported registers through sysfs entries found in
/sys/devices/system/cpu/cpu<num>/amd_cpufreq/.

This new CPUFreq driver can only be used by providing a module parameter,
amd_cpufreq.cppc_enable=1.

The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
allow the userspace to:
* Tweak the values to fit its workload.
* Apply a profile from AMD's optimization guides.

Profiles will be documented in the performance/optimization guides.

Note:
* AMD systems will not have a policy applied in the kernel at this time.

TODO:
* Create a linux userspace tool that will help users generate a CPPC profile
  for their target workload.
* Create a general CPPC policy in the kernel.

v1->v2:
* Add macro to ensure BUFFER only registers have BUFFER type.
* Add support macro to make the right check based on register type.
* Remove support checks for registers which are mandatory.

v2->v3:
* Introduce new amd-cpufreq driver which will have priority over acpi-cpufreq.
* Move new sysfs entries creation to amd-cpufreq.

Janakarajan Natarajan (3):
  acpi/cppc: Add macros for CPPC register checks
  acpi/cppc: Ensure only supported CPPC sysfs entries are created
  drivers/cpufreq: Add a CPUFreq driver for AMD processors (Fam17h and
    later)

Yazen Ghannam (3):
  acpi/cppc: Rework cppc_set_perf() to use cppc_regs index
  acpi/cppc: Add support for optional CPPC registers
  acpi/cppc: Add support for CPPC Enable register

 drivers/acpi/cppc_acpi.c       | 244 ++++++++++++++++++++++++++++-----
 drivers/cpufreq/Kconfig.x86    |  14 ++
 drivers/cpufreq/Makefile       |   4 +-
 drivers/cpufreq/amd-cpufreq.c  | 233 +++++++++++++++++++++++++++++++
 drivers/cpufreq/cppc_cpufreq.c |   6 +-
 include/acpi/cppc_acpi.h       |  11 +-
 6 files changed, 474 insertions(+), 38 deletions(-)
 create mode 100644 drivers/cpufreq/amd-cpufreq.c

Comments

Peter Zijlstra July 13, 2019, 10:46 a.m. UTC | #1
On Wed, Jul 10, 2019 at 06:37:09PM +0000, Natarajan, Janakarajan wrote:
> CPPC (Collaborative Processor Performance Control) offers optional
> registers which can be used to tune the system based on energy and/or
> performance requirements.
> 
> Newer AMD processors (>= Family 17h) add support for a subset of these
> optional CPPC registers, based on ACPI v6.1.
> 
> The following are the supported CPPC registers for which sysfs entries
> are created:
> * enable                (NEW)
> * max_perf              (NEW)
> * min_perf              (NEW)
> * energy_perf
> * lowest_perf
> * nominal_perf
> * desired_perf          (NEW)
> * feedback_ctrs
> * auto_sel_enable       (NEW)
> * lowest_nonlinear_perf
> 
> First, update cppc_acpi to create sysfs entries only when the optional
> registers are known to be supported.
> 
> Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
> to access the newly supported registers through sysfs entries found in
> /sys/devices/system/cpu/cpu<num>/amd_cpufreq/.
> 
> This new CPUFreq driver can only be used by providing a module parameter,
> amd_cpufreq.cppc_enable=1.
> 
> The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
> allow the userspace to:
> * Tweak the values to fit its workload.
> * Apply a profile from AMD's optimization guides.

So in general I think it is a huge mistake to expose all that to
userspace. Before you know it, there's tools that actually rely on it,
and then inhibit the kernel from doing anything sane with it.

> Profiles will be documented in the performance/optimization guides.

I don't think userspace can really do anything sane with this; it lacks
much if not all useful information.

> Note:
> * AMD systems will not have a policy applied in the kernel at this time.

And why the heck not? We're trying to move all cpufreq into the
scheduler and have only a single governor, namely schedutil -- yes,
we're still stuck with legacy, and we're still working on performance
parity in some cases, but I really hope to get rid of all other cpufreq
governors eventually.

And if you look at schedutil (schedutil_cpu_util in specific) then
you'll see it is already prepared for CPPC and currently only held back
by the generic cpufreq interface.

It currently only sets desired freq, it has information for
min/guaranteed, and once we get thermal intergrated we might have
sensible data for max freq too.

> TODO:
> * Create a linux userspace tool that will help users generate a CPPC profile
>   for their target workload.

Basically a big fat NAK for this approach to cpufreq.

> * Create a general CPPC policy in the kernel.

We already have that, sorta.
Yazen Ghannam July 15, 2019, 5:57 p.m. UTC | #2
> -----Original Message-----
> From: Peter Zijlstra <peterz@infradead.org>
> Sent: Saturday, July 13, 2019 5:46 AM
> To: Natarajan, Janakarajan <Janakarajan.Natarajan@amd.com>
> Cc: linux-acpi@vger.kernel.org; linux-kernel@vger.kernel.org; linux-pm@vger.kernel.org; devel@acpica.org; Rafael J . Wysocki
> <rjw@rjwysocki.net>; Len Brown <lenb@kernel.org>; Viresh Kumar <viresh.kumar@linaro.org>; Robert Moore
> <robert.moore@intel.com>; Erik Schmauss <erik.schmauss@intel.com>; Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Subject: Re: [PATCHv3 0/6] CPPC optional registers AMD support
> 
> On Wed, Jul 10, 2019 at 06:37:09PM +0000, Natarajan, Janakarajan wrote:
> > CPPC (Collaborative Processor Performance Control) offers optional
> > registers which can be used to tune the system based on energy and/or
> > performance requirements.
> >
> > Newer AMD processors (>= Family 17h) add support for a subset of these
> > optional CPPC registers, based on ACPI v6.1.
> >
> > The following are the supported CPPC registers for which sysfs entries
> > are created:
> > * enable                (NEW)
> > * max_perf              (NEW)
> > * min_perf              (NEW)
> > * energy_perf
> > * lowest_perf
> > * nominal_perf
> > * desired_perf          (NEW)
> > * feedback_ctrs
> > * auto_sel_enable       (NEW)
> > * lowest_nonlinear_perf
> >
> > First, update cppc_acpi to create sysfs entries only when the optional
> > registers are known to be supported.
> >
> > Next, a new CPUFreq driver is introduced to enable the OSPM and the userspace
> > to access the newly supported registers through sysfs entries found in
> > /sys/devices/system/cpu/cpu<num>/amd_cpufreq/.
> >
> > This new CPUFreq driver can only be used by providing a module parameter,
> > amd_cpufreq.cppc_enable=1.
> >
> > The purpose of exposing the registers via the amd-cpufreq sysfs entries is to
> > allow the userspace to:
> > * Tweak the values to fit its workload.
> > * Apply a profile from AMD's optimization guides.
> 
> So in general I think it is a huge mistake to expose all that to
> userspace. Before you know it, there's tools that actually rely on it,
> and then inhibit the kernel from doing anything sane with it.
> 

Okay, makes sense.

Is there any way to expose a sysfs interface and make it explicitly "experimental"? Maybe putting it in Documentation/ABI/testing/?

Or do you think it's just not worth it?

> > Profiles will be documented in the performance/optimization guides.
> 
> I don't think userspace can really do anything sane with this; it lacks
> much if not all useful information.
> 
> > Note:
> > * AMD systems will not have a policy applied in the kernel at this time.
> 
> And why the heck not? We're trying to move all cpufreq into the
> scheduler and have only a single governor, namely schedutil -- yes,
> we're still stuck with legacy, and we're still working on performance
> parity in some cases, but I really hope to get rid of all other cpufreq
> governors eventually.
> 

Because this is new to AMD systems, we didn't want to enforce a default policy.

We figured that exposing the CPPC interface would be a good way to decouple policy from the kernel and let users experiment/tune their systems, like using the userspace governor. And if some pattern emerged then we could make that a default policy in the kernel (for AMD or in general).

But you're saying we should focus more on working with the schedutil governor, correct? Do you think there's still a use for a userspace governor?

> And if you look at schedutil (schedutil_cpu_util in specific) then
> you'll see it is already prepared for CPPC and currently only held back
> by the generic cpufreq interface.
> 
> It currently only sets desired freq, it has information for
> min/guaranteed, and once we get thermal intergrated we might have
> sensible data for max freq too.
> 

Will do.

> > TODO:
> > * Create a linux userspace tool that will help users generate a CPPC profile
> >   for their target workload.
> 
> Basically a big fat NAK for this approach to cpufreq.
> 

Is that for exposing the sysfs interface, having a stub driver, or both?

Would it be better to have a cpufreq driver that implements some policy rather than just providing the sysfs interface?

> > * Create a general CPPC policy in the kernel.
> 
> We already have that, sorta.

Right, but it seems to still be focused on CPU frequency rather than abstract performance like how CPPC is defined.

This is another reason for exposing the CPPC interface directly. We'll give users the ability to interact with the platform, using CPPC, without having to follow the CPUFREQ paradigm.

Do you think this is doable? Or should we always have some kernel interaction because of the scheduler, etc.?

Thanks,
Yazen