diff mbox series

[RFC,2/2] platform/x86/amd: pmf: Add manual control support

Message ID 20240926025955.1728766-3-superm1@kernel.org (mailing list archive)
State Superseded, archived
Headers show
Series "custom" ACPI platform profile support | expand

Commit Message

Mario Limonciello Sept. 26, 2024, 2:59 a.m. UTC
From: Mario Limonciello <mario.limonciello@amd.com>

A number of users resort to using reverse engineered software like
ryzenadj to manipulate debugging interfaces for modifying APU settings.

At a glance these tools are useful, but the problem is they break
state machines in other software such as the PMF driver or the OEM
EC.

Offer a knob for PMF to allow 'manual control' which will users can
directly change things like fPPT and sPPT. As this can be harmful for
a system to try to push limits outside of a thermal design, taint the
kernel and show a critical message when in use.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 Documentation/ABI/testing/sysfs-amd-pmf | 10 +++
 drivers/platform/x86/amd/pmf/Makefile   |  1 +
 drivers/platform/x86/amd/pmf/core.c     |  9 +++
 drivers/platform/x86/amd/pmf/manual.c   | 88 +++++++++++++++++++++++++
 drivers/platform/x86/amd/pmf/pmf.h      |  5 ++
 drivers/platform/x86/amd/pmf/sps.c      |  4 ++
 6 files changed, 117 insertions(+)
 create mode 100644 drivers/platform/x86/amd/pmf/manual.c

Comments

Antheas Kapenekakis Sept. 26, 2024, 8:52 a.m. UTC | #1
Hi Mario,

It is fine to require a module parameter for turning on custom profiles.

However, distributions such as Bazzite use per-device kernel
parameters, which, while user accessible, will not be modified by the
user for 95% of use-cases. In fact, the Bazzite update system manages
the kernel parameters of devices automatically.

What this would mean in practice is that for devices where this custom
control may be used, the module parameter will be set globally for all
of them and taint their kernels.

Instead, only taint the kernel when entering custom mode. If combined
with something such as `custom_mode_choices`, only taint the kernel if
`amd-pmf-user` is selected after that.

> +{
> +       add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> +       pr_crit("Manual PMF control is enabled, please disable it before "
> +               "reporting any bugs unrelated to PMF.\n");
> +}

"Manual PMF control is enabled. If the device supports other ways of
thermal management, please use those before reporting any bugs
unrelated to PMF. If not, only if setting TDP is required for testing
(e.g., under load in specific thermal conditions), proceed with the
understanding that this module may cause interference, especially with
the amd-gpu driver, the suspend process, and, if the parameters are
out of spec, general stability of the system."

Antheas
Shyam Sundar S K Sept. 26, 2024, 10:25 a.m. UTC | #2
On 9/26/2024 08:29, Mario Limonciello wrote:
> From: Mario Limonciello <mario.limonciello@amd.com>
> 
> A number of users resort to using reverse engineered software like
> ryzenadj to manipulate debugging interfaces for modifying APU settings.
> 
> At a glance these tools are useful, but the problem is they break
> state machines in other software such as the PMF driver or the OEM
> EC.
> 
> Offer a knob for PMF to allow 'manual control' which will users can
> directly change things like fPPT and sPPT. As this can be harmful for
> a system to try to push limits outside of a thermal design, taint the
> kernel and show a critical message when in use.

I appreciate the proposal, but giving users this control seems similar
to using tools like Ryzenadj or Ryzen Master, which are primarily for
overclocking. Atleast Ryzen Master has a dedicated mailbox with PMFW.

While some existing PMF mailboxes are being deprecated, and SPL has
been removed starting with Strix[1] due to the APTS method.

It's important to use some settings together rather than individually
(which the users might not be aware of). For instance, updating SPL
requires corresponding updates to STT limits to avoid negative outcomes.

Additionally, altering these parameters can exceed thermal limits and
potentially void warranties.

Considering CnQF, why not let OEMs opt-in and allow the algorithm to
manage power budgets, rather than providing these controls to users
from the kernel when userspace tools already exist?

Please note that on systems with Smart PC enabled, if users manually
adjust the system thermals, it can lead to the thermal controls
becoming unmanageable.

Please consider this perspective.

[1]
https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd/pmf/sps.c#L193

IMHO, having a 'custom' platform-profile node is fine, but tieing that
to PMF static slider would be a NO from my side, because of the above
said reasons.

Thanks,
Shyam

> 
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  Documentation/ABI/testing/sysfs-amd-pmf | 10 +++
>  drivers/platform/x86/amd/pmf/Makefile   |  1 +
>  drivers/platform/x86/amd/pmf/core.c     |  9 +++
>  drivers/platform/x86/amd/pmf/manual.c   | 88 +++++++++++++++++++++++++
>  drivers/platform/x86/amd/pmf/pmf.h      |  5 ++
>  drivers/platform/x86/amd/pmf/sps.c      |  4 ++
>  6 files changed, 117 insertions(+)
>  create mode 100644 drivers/platform/x86/amd/pmf/manual.c
> 
> diff --git a/Documentation/ABI/testing/sysfs-amd-pmf b/Documentation/ABI/testing/sysfs-amd-pmf
> index 7fc0e1c2b76b..6f3d5cbf443f 100644
> --- a/Documentation/ABI/testing/sysfs-amd-pmf
> +++ b/Documentation/ABI/testing/sysfs-amd-pmf
> @@ -11,3 +11,13 @@ Description:	Reading this file tells if the AMD Platform Management(PMF)
>  		To turn off CnQF user can write "off" to the sysfs node.
>  		Note: Systems that support auto mode will not have this sysfs file
>  		available.
> +
> +What:		/sys/devices/platform/*/{spl, fppt, sppt, sppt_apu_only, stt_min, stt_limit_apu, stt_skip_temp}
> +Date:		December 2024
> +Contact:	Mario Limonciello <mario.limonciello@amd.com>
> +Description:	Manual control of AMD PMF APU coefficients
> +		.
> +		These files are used to manually control the APU coefficients.
> +		In order to write to these files the module most be
> +		loaded with manual_control=1 and the user must write "custom"
> +		to the ACPI platform profile.
> diff --git a/drivers/platform/x86/amd/pmf/Makefile b/drivers/platform/x86/amd/pmf/Makefile
> index 7d6079b02589..81444d6f4428 100644
> --- a/drivers/platform/x86/amd/pmf/Makefile
> +++ b/drivers/platform/x86/amd/pmf/Makefile
> @@ -7,4 +7,5 @@
>  obj-$(CONFIG_AMD_PMF) += amd-pmf.o
>  amd-pmf-objs := core.o acpi.o sps.o \
>  		auto-mode.o cnqf.o \
> +		manual.o \
>  		tee-if.o spc.o pmf-quirks.o
> diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
> index d6af0ca036f1..52a68ca094be 100644
> --- a/drivers/platform/x86/amd/pmf/core.c
> +++ b/drivers/platform/x86/amd/pmf/core.c
> @@ -53,6 +53,10 @@ static bool force_load;
>  module_param(force_load, bool, 0444);
>  MODULE_PARM_DESC(force_load, "Force load this driver on supported older platforms (experimental)");
>  
> +bool pmf_manual_control;
> +module_param_named(manual_control, pmf_manual_control, bool, 0444);
> +MODULE_PARM_DESC(manual_control, "Expose manual control knobs (experimental)");
> +
>  static int amd_pmf_pwr_src_notify_call(struct notifier_block *nb, unsigned long event, void *data)
>  {
>  	struct amd_pmf_dev *pmf = container_of(nb, struct amd_pmf_dev, pwr_src_notifier);
> @@ -349,6 +353,10 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
>  		dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
>  	}
>  
> +	if (pmf_manual_control) {
> +		amd_pmf_init_manual_control(dev);
> +		return;
> +	}
>  	amd_pmf_init_smart_pc(dev);
>  	if (dev->smart_pc_enabled) {
>  		dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
> @@ -485,6 +493,7 @@ static void amd_pmf_remove(struct platform_device *pdev)
>  
>  static const struct attribute_group *amd_pmf_driver_groups[] = {
>  	&cnqf_feature_attribute_group,
> +	&manual_attribute_group,
>  	NULL,
>  };
>  
> diff --git a/drivers/platform/x86/amd/pmf/manual.c b/drivers/platform/x86/amd/pmf/manual.c
> new file mode 100644
> index 000000000000..b33fc3cd8d61
> --- /dev/null
> +++ b/drivers/platform/x86/amd/pmf/manual.c
> @@ -0,0 +1,88 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * AMD Platform Management Framework Driver
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Mario Limonciello <mario.limonciello@amd.com>
> + */
> +
> +#include "pmf.h"
> +
> +#define pmf_manual_attribute(_name, _set_command, _get_command)		\
> +static ssize_t _name##_store(struct device *d,				\
> +			     struct device_attribute *attr,		\
> +			     const char *buf, size_t count)		\
> +{									\
> +	struct amd_pmf_dev *dev = dev_get_drvdata(d);			\
> +	uint val;							\
> +									\
> +	if (dev->current_profile != PLATFORM_PROFILE_CUSTOM) {		\
> +		dev_warn_once(dev->dev,					\
> +			      "Manual control is disabled, please set "	\
> +			      "platform profile to custom.\n");		\
> +		return -EINVAL;						\
> +	}								\
> +									\
> +	if (kstrtouint(buf, 10, &val) < 0)				\
> +		return -EINVAL;						\
> +									\
> +	amd_pmf_send_cmd(dev, _set_command, false, val, NULL);		\
> +									\
> +	return count;							\
> +}									\
> +static ssize_t _name##_show(struct device *d,				\
> +			   struct device_attribute *attr,		\
> +			   char *buf)					\
> +{									\
> +	struct amd_pmf_dev *dev = dev_get_drvdata(d);			\
> +	uint val;							\
> +									\
> +	amd_pmf_send_cmd(dev, _get_command, true, ARG_NONE, &val);	\
> +									\
> +	return sysfs_emit(buf, "%u\n", val);				\
> +}
> +
> +pmf_manual_attribute(spl, SET_SPL, GET_SPL);
> +static DEVICE_ATTR_RW(spl);
> +pmf_manual_attribute(fppt, SET_FPPT, GET_FPPT);
> +static DEVICE_ATTR_RW(fppt);
> +pmf_manual_attribute(sppt, SET_SPPT, GET_SPPT);
> +static DEVICE_ATTR_RW(sppt);
> +pmf_manual_attribute(sppt_apu_only, SET_SPPT_APU_ONLY, GET_SPPT_APU_ONLY);
> +static DEVICE_ATTR_RW(sppt_apu_only);
> +pmf_manual_attribute(stt_min, SET_STT_MIN_LIMIT, GET_STT_MIN_LIMIT);
> +static DEVICE_ATTR_RW(stt_min);
> +pmf_manual_attribute(stt_limit_apu, SET_STT_LIMIT_APU, GET_STT_LIMIT_APU);
> +static DEVICE_ATTR_RW(stt_limit_apu);
> +pmf_manual_attribute(stt_skin_temp, SET_STT_LIMIT_HS2, GET_STT_LIMIT_HS2);
> +static DEVICE_ATTR_RW(stt_skin_temp);
> +
> +static umode_t manual_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
> +{
> +	return pmf_manual_control ? 0660 : 0;
> +}
> +
> +static struct attribute *manual_attrs[] = {
> +	&dev_attr_spl.attr,
> +	&dev_attr_fppt.attr,
> +	&dev_attr_sppt.attr,
> +	&dev_attr_sppt_apu_only.attr,
> +	&dev_attr_stt_min.attr,
> +	&dev_attr_stt_limit_apu.attr,
> +	&dev_attr_stt_skin_temp.attr,
> +	NULL,
> +};
> +
> +const struct attribute_group manual_attribute_group = {
> +	.attrs = manual_attrs,
> +	.is_visible = manual_attr_is_visible,
> +};
> +
> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev)
> +{
> +	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> +	pr_crit("Manual PMF control is enabled, please disable it before "
> +		"reporting any bugs unrelated to PMF.\n");
> +}
> diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
> index 8ce8816da9c1..ca3df63cf190 100644
> --- a/drivers/platform/x86/amd/pmf/pmf.h
> +++ b/drivers/platform/x86/amd/pmf/pmf.h
> @@ -798,4 +798,9 @@ void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
>  /* Quirk infrastructure */
>  void amd_pmf_quirks_init(struct amd_pmf_dev *dev);
>  
> +/* Manual configuration */
> +extern bool pmf_manual_control;
> +extern const struct attribute_group manual_attribute_group;
> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev);
> +
>  #endif /* PMF_H */
> diff --git a/drivers/platform/x86/amd/pmf/sps.c b/drivers/platform/x86/amd/pmf/sps.c
> index 92f7fb22277d..6db88e523a86 100644
> --- a/drivers/platform/x86/amd/pmf/sps.c
> +++ b/drivers/platform/x86/amd/pmf/sps.c
> @@ -305,6 +305,8 @@ int amd_pmf_get_pprof_modes(struct amd_pmf_dev *pmf)
>  	case PLATFORM_PROFILE_LOW_POWER:
>  		mode = POWER_MODE_POWER_SAVER;
>  		break;
> +	case PLATFORM_PROFILE_CUSTOM:
> +		return 0;
>  	default:
>  		dev_err(pmf->dev, "Unknown Platform Profile.\n");
>  		return -EOPNOTSUPP;
> @@ -412,6 +414,8 @@ int amd_pmf_init_sps(struct amd_pmf_dev *dev)
>  	set_bit(PLATFORM_PROFILE_LOW_POWER, dev->pprof.choices);
>  	set_bit(PLATFORM_PROFILE_BALANCED, dev->pprof.choices);
>  	set_bit(PLATFORM_PROFILE_PERFORMANCE, dev->pprof.choices);
> +	if (pmf_manual_control)
> +		set_bit(PLATFORM_PROFILE_CUSTOM, dev->pprof.choices);
>  
>  	/* Create platform_profile structure and register */
>  	err = platform_profile_register(&dev->pprof);
Antheas Kapenekakis Sept. 26, 2024, 11 a.m. UTC | #3
Hi Shyam,

> I appreciate the proposal, but giving users this control seems similar
> to using tools like Ryzenadj or Ryzen Master, which are primarily for
> overclocking. Atleast Ryzen Master has a dedicated mailbox with PMFW.

In the laptop market I agree with you. However, in the handheld
market, users expect to be able to lower the power envelope of the
device on demand in a granular fashion. As the battery drop is
measured in Watts, tying a slider to Watts is a natural solution.

Most of the time, when those controls are used it is to limit the
thermal envelope of the device, not exceed it. We want to remove the
use of these tools and allow manufacturers the ability to customise
the power envelope they offer to users.

> While some existing PMF mailboxes are being deprecated, and SPL has
> been removed starting with Strix[1] due to the APTS method.
>
> It's important to use some settings together rather than individually
> (which the users might not be aware of). For instance, updating SPL
> requires corresponding updates to STT limits to avoid negative outcomes.

This suggestion was referring to a combined slider, much like the
suggestion below. So STT limits would be modified in tandem,
respecting manufacturer profiles. See comments below.

If you find the name SPL disagreeable, it could be named {tdp,
tdp_min, tdp_max}. This is the solution used by Valve on the Steam
Deck (power1_cap{+min,max}, power2_cap{+min,max}).

In addition, boost is seen as detrimental to handheld devices, with
most users disliking and disabling it. Steam Deck does not use boost.
It is disabled by Steam (power1_cap == power2_cap). So STT and STAPM
are not very relevant. In addition, Steam Deck van gogh has a more
linear response so TDP limits are less required.

> Additionally, altering these parameters can exceed thermal limits and
> potentially void warranties.
>
> Considering CnQF, why not let OEMs opt-in and allow the algorithm to
> manage power budgets, rather than providing these controls to users
> from the kernel when userspace tools already exist?
>
> Please note that on systems with Smart PC enabled, if users manually
> adjust the system thermals, it can lead to the thermal controls
> becoming unmanageable.
>
> Please consider this perspective.
>
> [1]
> https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd/pmf/sps.c#L193

This slider looks like it would do what we would need. I will note the
importance of tying the slider to Watts to manage user expectation and
adding more gradations (e.g., 15-25, every 1-2W for sub-50W devices).

We have found automatic solutions to not work in the handheld market,
as most AAA games will consume the maximum TDP the profile allows. In
addition, due to performance non-linearities above e.g., 15W,
performance will be similar. For example, on the Legion Go,
performance might increase 20% when going from 17W-25W, however
consumption will increase from ~30W to 45W (50%) which greatly affects
battery life.

Therefore, we need to allow the user to choose between 20% and extra
battery life. If you think we can use an algorithm for this I would
love to know.

Much like you, we dislike AutoTDP solutions that use e.g., RyzenAdj, as they:
 1) Do not respect manufacturer limits
 2) Cause system instability such as stutters when setting values
 3) Can cause crashes if they access the mailbox at the same time as
the AMD drm driver.

Thank you for your time,
Antheas
Mario Limonciello Sept. 26, 2024, 6:09 p.m. UTC | #4
On 9/26/2024 06:00, Antheas Kapenekakis wrote:
> Hi Shyam,
> 
>> I appreciate the proposal, but giving users this control seems similar
>> to using tools like Ryzenadj or Ryzen Master, which are primarily for
>> overclocking. Atleast Ryzen Master has a dedicated mailbox with PMFW.
> 
> In the laptop market I agree with you. However, in the handheld
> market, users expect to be able to lower the power envelope of the
> device on demand in a granular fashion. As the battery drop is
> measured in Watts, tying a slider to Watts is a natural solution.
> 
> Most of the time, when those controls are used it is to limit the
> thermal envelope of the device, not exceed it. We want to remove the
> use of these tools and allow manufacturers the ability to customise
> the power envelope they offer to users.
> 
>> While some existing PMF mailboxes are being deprecated, and SPL has
>> been removed starting with Strix[1] due to the APTS method.

Hmm, what do you think about about offering a wrapper for this for 
people to manipulate?

>>
>> It's important to use some settings together rather than individually
>> (which the users might not be aware of). For instance, updating SPL
>> requires corresponding updates to STT limits to avoid negative outcomes.
> 

The tough part about striking the balance here is how would an end user 
know what values to set in tandem.  I think a lot of people just assume 
they can "just change SPL" and that's it and have a good experience.

> This suggestion was referring to a combined slider, much like the
> suggestion below. So STT limits would be modified in tandem,
> respecting manufacturer profiles. See comments below.
> 
> If you find the name SPL disagreeable, it could be named {tdp,
> tdp_min, tdp_max}. This is the solution used by Valve on the Steam
> Deck (power1_cap{+min,max}, power2_cap{+min,max}).

It's not so much that it's disagreeable term but Shyam is pointing out 
that SPL is no longer a valid argument to the platform mailbox.

> 
> In addition, boost is seen as detrimental to handheld devices, with
> most users disliking and disabling it. Steam Deck does not use boost.
> It is disabled by Steam (power1_cap == power2_cap). So STT and STAPM
> are not very relevant. In addition, Steam Deck van gogh has a more
> linear response so TDP limits are less required.
> 
>> Additionally, altering these parameters can exceed thermal limits and
>> potentially void warranties.
>>
>> Considering CnQF, why not let OEMs opt-in and allow the algorithm to
>> manage power budgets, rather than providing these controls to users
>> from the kernel when userspace tools already exist?

The problem is all of the RE tools rely upon PCI config space access or 
/dev/mem access to manipulate undocumented register offsets.

When the system is under kernel lockdown (such as with distro kernel 
when UEFI secure boot is turned on) then those interfaces are 
intentionally locked down.

That's why I'm hoping we can strike some sort of balance at the request 
for some advanced users being able to tune values in a predictable 
fashion while also allowing OEMs to configure policies like CNQF or 
Smart PC when users for users that don't tinker.

>>
>> Please note that on systems with Smart PC enabled, if users manually
>> adjust the system thermals, it can lead to the thermal controls
>> becoming unmanageable.

Yeah; that's why as this RFC patch I didn't let CNQF, ITS or Smart PC 
initialize.  Basically if manual control is enabled then "SPS" and 
manual sysfs control is the only thing available.

> 
> Much like you, we dislike AutoTDP solutions that use e.g., RyzenAdj, as they:
>   1) Do not respect manufacturer limits
>   2) Cause system instability such as stutters when setting values
>   3) Can cause crashes if they access the mailbox at the same time as
> the AMD drm driver.
> 

Yes.  Exactly why I feel that if we offer an interface instead people 
can use such an interface instead of these tools.
Mario Limonciello Sept. 26, 2024, 6:53 p.m. UTC | #5
On 9/26/2024 03:52, Antheas Kapenekakis wrote:
> Hi Mario,
> 
> It is fine to require a module parameter for turning on custom profiles.
> 
> However, distributions such as Bazzite use per-device kernel
> parameters, which, while user accessible, will not be modified by the
> user for 95% of use-cases. In fact, the Bazzite update system manages
> the kernel parameters of devices automatically.
> 
> What this would mean in practice is that for devices where this custom
> control may be used, the module parameter will be set globally for all
> of them and taint their kernels.
> 
> Instead, only taint the kernel when entering custom mode. If combined
> with something such as `custom_mode_choices`, only taint the kernel if
> `amd-pmf-user` is selected after that.

Yeah if we continue down a variation of this direction that is a 
sensible change to push the taint down to only when in it's actually 
been used, not just when the module parameter is set.

> 
>> +{
>> +       add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
>> +       pr_crit("Manual PMF control is enabled, please disable it before "
>> +               "reporting any bugs unrelated to PMF.\n");
>> +}
> 
> "Manual PMF control is enabled. If the device supports other ways of
> thermal management, please use those before reporting any bugs
> unrelated to PMF. If not, only if setting TDP is required for testing
> (e.g., under load in specific thermal conditions), proceed with the
> understanding that this module may cause interference, especially with
> the amd-gpu driver, the suspend process, and, if the parameters are
> out of spec, general stability of the system."
> 
> Antheas
Derek John Clark Sept. 27, 2024, 4:01 a.m. UTC | #6
Hello all,

>>> I appreciate the proposal, but giving users this control seems similar
>>> to using tools like Ryzenadj or Ryzen Master, which are primarily for
>>> overclocking. Atleast Ryzen Master has a dedicated mailbox with PMFW.

> In the laptop market I agree with you. However, in the handheld
> market, users expect to be able to lower the power envelope of the
> device on demand in a granular fashion. As the battery drop is
> measured in Watts, tying a slider to Watts is a natural solution.
>
> Most of the time, when those controls are used it is to limit the
> thermal envelope of the device, not exceed it. We want to remove the
> use of these tools and allow manufacturers the ability to customise
> the power envelope they offer to users.

I agree with Mario here. Due to the use case and battery size, handheld users
intent is to minimize power draw while maintaining performance. The typical use
case in "per-watt" control is to manage this. It is usually done
per-game (either
manually by the user or recalled with userspace tools). We have found that the
'performance' profile alone won't always limit the power enough, which decreases
the playtime, and 'balanced' will limit it too much, decreasing performance.
Users want to be able to tune a "sweet spot" to get maximal performance from
both metrics.

>>> While some existing PMF mailboxes are being deprecated, and SPL has
>>> been removed starting with Strix[1] due to the APTS method.

>Hmm, what do you think about about offering a wrapper for this for
>people to manipulate?

>>> It's important to use some settings together rather than individually
>>> (which the users might not be aware of). For instance, updating SPL
>>> requires corresponding updates to STT limits to avoid negative outcomes.

>The tough part about striking the balance here is how would an end user
>know what values to set in tandem.  I think a lot of people just assume
>they can "just change SPL" and that's it and have a good experience.

I'm unsure of the generalized case here, but if we are using this to limit SPL
rather than raise it over design spec, then it would seem to me that STT would
be set to match or exceed SPL and the cooling solution would be able to
compensate for that. I'm happy to be corrected here if this is not a correct
assumption. I think there may be some variation on how manufacturers implement
this in the BIOS. For example, the Legion Go uses STT to push TDP to the
thermal limit (between SPL and SPPT) when using the ACPI profiles. Their
"custom" profile changes the behavior to fully respect the user set SPL/SPPT/
FPPT. I'm not sure if/how others handle this differently. In any case,
I would expect
the driver could handle this.

>> This suggestion was referring to a combined slider, much like the
>> suggestion below. So STT limits would be modified in tandem,
>> respecting manufacturer profiles. See comments below.
>>
>> If you find the name SPL disagreeable, it could be named {tdp,
>> tdp_min, tdp_max}. This is the solution used by Valve on the Steam
>> Deck (power1_cap{+min,max}, power2_cap{+min,max}).

>It's not so much that it's disagreeable term but Shyam is pointing out
>that SPL is no longer a valid argument to the platform mailbox.

I think intuitive generic terms would be ideal. [ppt|sppt|fppt]_limit[_min|_max]
are well understood by power users currently. There should be some
terminology that applies generally across different implementations of similar
concepts.

>> In addition, boost is seen as detrimental to handheld devices, with
>> most users disliking and disabling it. Steam Deck does not use boost.
>> It is disabled by Steam (power1_cap == power2_cap). So STT and STAPM
>> are not very relevant. In addition, Steam Deck van gogh has a more
>> linear response so TDP limits are less required.

I find this to be case by case, some games have more sudden/dynamic loads and
need the extra overhead, while others will waste it. Flexibility is important I
think. The Deck also benefits from scale and Steam integration right now so
publishers are able to, and do, tune for that device specifically. I don't know
how far the lessons from that device transfer to other handhelds. Having them
as an option, even if unused, would be a benefit.

>>> Additionally, altering these parameters can exceed thermal limits and
>>> potentially void warranties.
>>>
>>> Considering CnQF, why not let OEMs opt-in and allow the algorithm to
>>> manage power budgets, rather than providing these controls to users
>>> from the kernel when userspace tools already exist?

>The problem is all of the RE tools rely upon PCI config space access or
>/dev/mem access to manipulate undocumented register offsets.
>
>When the system is under kernel lockdown (such as with distro kernel
>when UEFI secure boot is turned on) then those interfaces are
>intentionally locked down.
>
>That's why I'm hoping we can strike some sort of balance at the request
>for some advanced users being able to tune values in a predictable
>fashion while also allowing OEMs to configure policies like CNQF or
>Smart PC when users for users that don't tinker.

>>> Please note that on systems with Smart PC enabled, if users manually
>>> adjust the system thermals, it can lead to the thermal controls
>>> becoming unmanageable.

>Yeah; that's why as this RFC patch I didn't let CNQF, ITS or Smart PC
>initialize.  Basically if manual control is enabled then "SPS" and
>manual sysfs control is the only thing available.

>> Much like you, we dislike AutoTDP solutions that use e.g., RyzenAdj, as they:
>>   1) Do not respect manufacturer limits
>>   2) Cause system instability such as stutters when setting values
>>   3) Can cause crashes if they access the mailbox at the same time as
>> the AMD drm driver.

>Yes.  Exactly why I feel that if we offer an interface instead people
>can use such an interface instead of these tools.

The general consensus on the userspace development side is that we'd like to
move away from needing to do these hacks to get the most from the hardware. I
would say things like RyzenAdj, ryzen_smu, acpi_call, etc. have provided enough
evidence that there is a gap in the baseline functionality that will ultimately
be filled *somehow*. I'd like to see a move towards this as an acceptable
in-kernel standard to replace those tools. In that same vein I think it's
important that common sense defaults and manufacturers intent are respected. I
have some ideas for how that could be done. If information is not
available for a
given device then the "custom" parameter will not be available in
power_profile_available and attempts to set to it will -EINVAL.
Similarly, have PPT
attrs only show in sysfs if that data is available.

- For legacy devices (and likely many smaller "boutique" manufacturers for the
  foreseeable future) a DMI table with limits for each supported attribute
  could provide this. Limiting this table to handhelds specifically would be
  acceptable to me, I don't see the value for laptops personally. For almost
  everything on the market currently we have this data, provided by the OEM.
- For devices with WMI (Legion Go, ROG Ally) the manufacturer has provided the
  methods to get these limits directly, so that could be handled in the
  appropriate manufacturer WMI drivers.
- For future devices this information should be (is?) included in the
  PMF tables in the BIOS and enabled automatically when detected by the driver,
  which will hopefully reduce the number of necessary kernel patches going
  forward.
Antheas Kapenekakis Sept. 27, 2024, 8:44 a.m. UTC | #7
Hi Mario,

On Thu, 26 Sept 2024 at 20:09, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 9/26/2024 06:00, Antheas Kapenekakis wrote:
> > Hi Shyam,
> >
> >> I appreciate the proposal, but giving users this control seems similar
> >> to using tools like Ryzenadj or Ryzen Master, which are primarily for
> >> overclocking. Atleast Ryzen Master has a dedicated mailbox with PMFW.
> >
> > In the laptop market I agree with you. However, in the handheld
> > market, users expect to be able to lower the power envelope of the
> > device on demand in a granular fashion. As the battery drop is
> > measured in Watts, tying a slider to Watts is a natural solution.
> >
> > Most of the time, when those controls are used it is to limit the
> > thermal envelope of the device, not exceed it. We want to remove the
> > use of these tools and allow manufacturers the ability to customise
> > the power envelope they offer to users.
> >
> >> While some existing PMF mailboxes are being deprecated, and SPL has
> >> been removed starting with Strix[1] due to the APTS method.
>
> Hmm, what do you think about about offering a wrapper for this for
> people to manipulate?

Having a single call that sets everything would be my preference, so I
would support this.

Although looking at [1], seems like it will be separate calls anway.

Link: https://github.com/torvalds/linux/blob/master/drivers/platform/x86/amd/pmf/sps.c#L193
[1]

> >> It's important to use some settings together rather than individually
> >> (which the users might not be aware of). For instance, updating SPL
> >> requires corresponding updates to STT limits to avoid negative outcomes.
> >
>
> The tough part about striking the balance here is how would an end user
> know what values to set in tandem.  I think a lot of people just assume
> they can "just change SPL" and that's it and have a good experience.

Spoken like a true linux user. Users do not know what a kernel or
sysfs is and they will not be touching any of this. It just needs to
be baby-proofed enough so for the 5 users that do it is safe.

Let us focus on the problem here. There are currently around 5
manufacturers shipping products in a space where granular TDP control
is expected and where AMD has not provided them with a solution.

And for this, there are two issues. First, there is no standard for
granular TDP control tuned by the manufacturer. Second, when such a
standard is created, there is a healthy pool of devices in the market
where the manufacturer cannot be expected to provide an updated BIOS
for them.
Therefore, we need a proposal where 1) the manufacturer can provide
granular TDP controls in a fully customizable manner (e.g., with a LUT
that controls everything), and 2) for devices that will not get that
tuning, a custom profile setting that will expose important tuning
parameters to userspace so that we can retrofit it and extend the
their lifespan.

> > This suggestion was referring to a combined slider, much like the
> > suggestion below. So STT limits would be modified in tandem,
> > respecting manufacturer profiles. See comments below.
> >
> > If you find the name SPL disagreeable, it could be named {tdp,
> > tdp_min, tdp_max}. This is the solution used by Valve on the Steam
> > Deck (power1_cap{+min,max}, power2_cap{+min,max}).
>
> It's not so much that it's disagreeable term but Shyam is pointing out
> that SPL is no longer a valid argument to the platform mailbox.

I'd tend to agree since the current mailbox targets that I know of are
STAPM limit (for STAPM) and skin temp limit (for STT). Since you used
the term SPL, I carried that over to the proposal, but it would not
control SPL. Instead it would control both of the former, including
sPPT and fPPT (if that is still supported; unclear in [1]; but
disabling boost will be a requirement).

> >
> > In addition, boost is seen as detrimental to handheld devices, with
> > most users disliking and disabling it. Steam Deck does not use boost.
> > It is disabled by Steam (power1_cap == power2_cap). So STT and STAPM
> > are not very relevant. In addition, Steam Deck van gogh has a more
> > linear response so TDP limits are less required.
> >
> >> Additionally, altering these parameters can exceed thermal limits and
> >> potentially void warranties.
> >>
> >> Considering CnQF, why not let OEMs opt-in and allow the algorithm to
> >> manage power budgets, rather than providing these controls to users
> >> from the kernel when userspace tools already exist?
>
> The problem is all of the RE tools rely upon PCI config space access or
> /dev/mem access to manipulate undocumented register offsets.
>
> When the system is under kernel lockdown (such as with distro kernel
> when UEFI secure boot is turned on) then those interfaces are
> intentionally locked down.
>
> That's why I'm hoping we can strike some sort of balance at the request
> for some advanced users being able to tune values in a predictable
> fashion while also allowing OEMs to configure policies like CNQF or
> Smart PC when users for users that don't tinker.

I will have to repeat that as far as the handheld market is concerned,
we are not talking about advanced users. Instead, we are talking for
all users.

> >>
> >> Please note that on systems with Smart PC enabled, if users manually
> >> adjust the system thermals, it can lead to the thermal controls
> >> becoming unmanageable.
>
> Yeah; that's why as this RFC patch I didn't let CNQF, ITS or Smart PC
> initialize.  Basically if manual control is enabled then "SPS" and
> manual sysfs control is the only thing available.

Sounds like you have your work cut out for you if the custom profile
is supposed to dynamically load.

> >
> > Much like you, we dislike AutoTDP solutions that use e.g., RyzenAdj, as they:
> >   1) Do not respect manufacturer limits
> >   2) Cause system instability such as stutters when setting values
> >   3) Can cause crashes if they access the mailbox at the same time as
> > the AMD drm driver.
> >
>
> Yes.  Exactly why I feel that if we offer an interface instead people
> can use such an interface instead of these tools.

While (in Bazzite) we have a solution that works very reliably and is
safe (not RyzenAdj), we have to begin cleaning up loose ends so that
we can 1) enable TDP control in a stock secureboot kernel with early
lockdown enabled (e.g., Fedora), 2) provide manufacturers with certain
reliability guarantees so they can warranty units running under linux,
3) prepare our solutions for being packaged in upstream distribution
repositories (Debian, Fedora), where using an existing solution is a
blocker as they do not provide or should provide such hardware access
when secure boot is enabled.

Though, since manufacturers like Ayaneo currently use RyzenAdj in
Windows, I might be nitpicking too much.

As for why Secure Boot is important, let add [2], where Rockstar
points the finger to Valve for BattlEye not working. Much of the
anticheat issue is due to the fact that it is trivial to cheat without
having a secureboot enabled kernel with the early lockdown flag
engaged, as it allows both custom drivers and userspace to gain access
to sensitive process memory in a way that is undetectable by
anticheat. Vanguard does not work in Linux for much of the same
reason.

Steam Deck is, for those uninitiated, a device that does not carry
Secureboot keys, and SteamOS is a distribution that does not support
Secure boot. Although both can change (Steam Deck BIOS supports secure
boot). However, Bazzite is secure boot enabled and we encourage our
users to leave it enabled, although for the moment they have to enroll
our MOK key, which most of them do.

Antheas

Link: https://www.pcgamer.com/games/grand-theft-auto/gta-online-is-no-longer-compatible-with-steam-deck-thanks-to-its-new-anti-cheat-software-despite-battleye-having-an-opt-in-system-for-this-sort-of-thing/
[2]
Antheas Kapenekakis Dec. 19, 2024, 1:12 p.m. UTC | #8
Hi Mario,
given that there is a Legion Go driver in the works, and Asus already
has a driver, the only thing that would be left for locking down ACPI
access is manufacturers w/o vendor APIs.

So, can we restart the conversation about this driver? It would be
nice to get to a place where we can lock down /dev/mem and ACPI by
spring.

Moreover, since the other two proposed drivers use the
firmware_attributes API, should this be used here as well?

By the way, you were right about needing a taint for this. Strix Point
fails to enter a lower power state during sleep if you set it to lower
than 10W. This is not ideal, as hawk point could go down to 5 while
still showing a power difference, but I am unsure where this bug
should be reported. This is both through ryzenadj/ALIB

Antheas

On Thu, 26 Sept 2024 at 05:00, Mario Limonciello <superm1@kernel.org> wrote:
>
> From: Mario Limonciello <mario.limonciello@amd.com>
>
> A number of users resort to using reverse engineered software like
> ryzenadj to manipulate debugging interfaces for modifying APU settings.
>
> At a glance these tools are useful, but the problem is they break
> state machines in other software such as the PMF driver or the OEM
> EC.
>
> Offer a knob for PMF to allow 'manual control' which will users can
> directly change things like fPPT and sPPT. As this can be harmful for
> a system to try to push limits outside of a thermal design, taint the
> kernel and show a critical message when in use.
>
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
>  Documentation/ABI/testing/sysfs-amd-pmf | 10 +++
>  drivers/platform/x86/amd/pmf/Makefile   |  1 +
>  drivers/platform/x86/amd/pmf/core.c     |  9 +++
>  drivers/platform/x86/amd/pmf/manual.c   | 88 +++++++++++++++++++++++++
>  drivers/platform/x86/amd/pmf/pmf.h      |  5 ++
>  drivers/platform/x86/amd/pmf/sps.c      |  4 ++
>  6 files changed, 117 insertions(+)
>  create mode 100644 drivers/platform/x86/amd/pmf/manual.c
>
> diff --git a/Documentation/ABI/testing/sysfs-amd-pmf b/Documentation/ABI/testing/sysfs-amd-pmf
> index 7fc0e1c2b76b..6f3d5cbf443f 100644
> --- a/Documentation/ABI/testing/sysfs-amd-pmf
> +++ b/Documentation/ABI/testing/sysfs-amd-pmf
> @@ -11,3 +11,13 @@ Description: Reading this file tells if the AMD Platform Management(PMF)
>                 To turn off CnQF user can write "off" to the sysfs node.
>                 Note: Systems that support auto mode will not have this sysfs file
>                 available.
> +
> +What:          /sys/devices/platform/*/{spl, fppt, sppt, sppt_apu_only, stt_min, stt_limit_apu, stt_skip_temp}
> +Date:          December 2024
> +Contact:       Mario Limonciello <mario.limonciello@amd.com>
> +Description:   Manual control of AMD PMF APU coefficients
> +               .
> +               These files are used to manually control the APU coefficients.
> +               In order to write to these files the module most be
> +               loaded with manual_control=1 and the user must write "custom"
> +               to the ACPI platform profile.
> diff --git a/drivers/platform/x86/amd/pmf/Makefile b/drivers/platform/x86/amd/pmf/Makefile
> index 7d6079b02589..81444d6f4428 100644
> --- a/drivers/platform/x86/amd/pmf/Makefile
> +++ b/drivers/platform/x86/amd/pmf/Makefile
> @@ -7,4 +7,5 @@
>  obj-$(CONFIG_AMD_PMF) += amd-pmf.o
>  amd-pmf-objs := core.o acpi.o sps.o \
>                 auto-mode.o cnqf.o \
> +               manual.o \
>                 tee-if.o spc.o pmf-quirks.o
> diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
> index d6af0ca036f1..52a68ca094be 100644
> --- a/drivers/platform/x86/amd/pmf/core.c
> +++ b/drivers/platform/x86/amd/pmf/core.c
> @@ -53,6 +53,10 @@ static bool force_load;
>  module_param(force_load, bool, 0444);
>  MODULE_PARM_DESC(force_load, "Force load this driver on supported older platforms (experimental)");
>
> +bool pmf_manual_control;
> +module_param_named(manual_control, pmf_manual_control, bool, 0444);
> +MODULE_PARM_DESC(manual_control, "Expose manual control knobs (experimental)");
> +
>  static int amd_pmf_pwr_src_notify_call(struct notifier_block *nb, unsigned long event, void *data)
>  {
>         struct amd_pmf_dev *pmf = container_of(nb, struct amd_pmf_dev, pwr_src_notifier);
> @@ -349,6 +353,10 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
>                 dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
>         }
>
> +       if (pmf_manual_control) {
> +               amd_pmf_init_manual_control(dev);
> +               return;
> +       }
>         amd_pmf_init_smart_pc(dev);
>         if (dev->smart_pc_enabled) {
>                 dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
> @@ -485,6 +493,7 @@ static void amd_pmf_remove(struct platform_device *pdev)
>
>  static const struct attribute_group *amd_pmf_driver_groups[] = {
>         &cnqf_feature_attribute_group,
> +       &manual_attribute_group,
>         NULL,
>  };
>
> diff --git a/drivers/platform/x86/amd/pmf/manual.c b/drivers/platform/x86/amd/pmf/manual.c
> new file mode 100644
> index 000000000000..b33fc3cd8d61
> --- /dev/null
> +++ b/drivers/platform/x86/amd/pmf/manual.c
> @@ -0,0 +1,88 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * AMD Platform Management Framework Driver
> + *
> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> + * All Rights Reserved.
> + *
> + * Author: Mario Limonciello <mario.limonciello@amd.com>
> + */
> +
> +#include "pmf.h"
> +
> +#define pmf_manual_attribute(_name, _set_command, _get_command)                \
> +static ssize_t _name##_store(struct device *d,                         \
> +                            struct device_attribute *attr,             \
> +                            const char *buf, size_t count)             \
> +{                                                                      \
> +       struct amd_pmf_dev *dev = dev_get_drvdata(d);                   \
> +       uint val;                                                       \
> +                                                                       \
> +       if (dev->current_profile != PLATFORM_PROFILE_CUSTOM) {          \
> +               dev_warn_once(dev->dev,                                 \
> +                             "Manual control is disabled, please set " \
> +                             "platform profile to custom.\n");         \
> +               return -EINVAL;                                         \
> +       }                                                               \
> +                                                                       \
> +       if (kstrtouint(buf, 10, &val) < 0)                              \
> +               return -EINVAL;                                         \
> +                                                                       \
> +       amd_pmf_send_cmd(dev, _set_command, false, val, NULL);          \
> +                                                                       \
> +       return count;                                                   \
> +}                                                                      \
> +static ssize_t _name##_show(struct device *d,                          \
> +                          struct device_attribute *attr,               \
> +                          char *buf)                                   \
> +{                                                                      \
> +       struct amd_pmf_dev *dev = dev_get_drvdata(d);                   \
> +       uint val;                                                       \
> +                                                                       \
> +       amd_pmf_send_cmd(dev, _get_command, true, ARG_NONE, &val);      \
> +                                                                       \
> +       return sysfs_emit(buf, "%u\n", val);                            \
> +}
> +
> +pmf_manual_attribute(spl, SET_SPL, GET_SPL);
> +static DEVICE_ATTR_RW(spl);
> +pmf_manual_attribute(fppt, SET_FPPT, GET_FPPT);
> +static DEVICE_ATTR_RW(fppt);
> +pmf_manual_attribute(sppt, SET_SPPT, GET_SPPT);
> +static DEVICE_ATTR_RW(sppt);
> +pmf_manual_attribute(sppt_apu_only, SET_SPPT_APU_ONLY, GET_SPPT_APU_ONLY);
> +static DEVICE_ATTR_RW(sppt_apu_only);
> +pmf_manual_attribute(stt_min, SET_STT_MIN_LIMIT, GET_STT_MIN_LIMIT);
> +static DEVICE_ATTR_RW(stt_min);
> +pmf_manual_attribute(stt_limit_apu, SET_STT_LIMIT_APU, GET_STT_LIMIT_APU);
> +static DEVICE_ATTR_RW(stt_limit_apu);
> +pmf_manual_attribute(stt_skin_temp, SET_STT_LIMIT_HS2, GET_STT_LIMIT_HS2);
> +static DEVICE_ATTR_RW(stt_skin_temp);
> +
> +static umode_t manual_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
> +{
> +       return pmf_manual_control ? 0660 : 0;
> +}
> +
> +static struct attribute *manual_attrs[] = {
> +       &dev_attr_spl.attr,
> +       &dev_attr_fppt.attr,
> +       &dev_attr_sppt.attr,
> +       &dev_attr_sppt_apu_only.attr,
> +       &dev_attr_stt_min.attr,
> +       &dev_attr_stt_limit_apu.attr,
> +       &dev_attr_stt_skin_temp.attr,
> +       NULL,
> +};
> +
> +const struct attribute_group manual_attribute_group = {
> +       .attrs = manual_attrs,
> +       .is_visible = manual_attr_is_visible,
> +};
> +
> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev)
> +{
> +       add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> +       pr_crit("Manual PMF control is enabled, please disable it before "
> +               "reporting any bugs unrelated to PMF.\n");
> +}
> diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
> index 8ce8816da9c1..ca3df63cf190 100644
> --- a/drivers/platform/x86/amd/pmf/pmf.h
> +++ b/drivers/platform/x86/amd/pmf/pmf.h
> @@ -798,4 +798,9 @@ void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
>  /* Quirk infrastructure */
>  void amd_pmf_quirks_init(struct amd_pmf_dev *dev);
>
> +/* Manual configuration */
> +extern bool pmf_manual_control;
> +extern const struct attribute_group manual_attribute_group;
> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev);
> +
>  #endif /* PMF_H */
> diff --git a/drivers/platform/x86/amd/pmf/sps.c b/drivers/platform/x86/amd/pmf/sps.c
> index 92f7fb22277d..6db88e523a86 100644
> --- a/drivers/platform/x86/amd/pmf/sps.c
> +++ b/drivers/platform/x86/amd/pmf/sps.c
> @@ -305,6 +305,8 @@ int amd_pmf_get_pprof_modes(struct amd_pmf_dev *pmf)
>         case PLATFORM_PROFILE_LOW_POWER:
>                 mode = POWER_MODE_POWER_SAVER;
>                 break;
> +       case PLATFORM_PROFILE_CUSTOM:
> +               return 0;
>         default:
>                 dev_err(pmf->dev, "Unknown Platform Profile.\n");
>                 return -EOPNOTSUPP;
> @@ -412,6 +414,8 @@ int amd_pmf_init_sps(struct amd_pmf_dev *dev)
>         set_bit(PLATFORM_PROFILE_LOW_POWER, dev->pprof.choices);
>         set_bit(PLATFORM_PROFILE_BALANCED, dev->pprof.choices);
>         set_bit(PLATFORM_PROFILE_PERFORMANCE, dev->pprof.choices);
> +       if (pmf_manual_control)
> +               set_bit(PLATFORM_PROFILE_CUSTOM, dev->pprof.choices);
>
>         /* Create platform_profile structure and register */
>         err = platform_profile_register(&dev->pprof);
> --
> 2.43.0
>
Mario Limonciello Dec. 19, 2024, 2:49 p.m. UTC | #9
On 12/19/2024 07:12, Antheas Kapenekakis wrote:
> Hi Mario,
> given that there is a Legion Go driver in the works, and Asus already
> has a driver, the only thing that would be left for locking down ACPI
> access is manufacturers w/o vendor APIs.
> 
> So, can we restart the conversation about this driver? It would be
> nice to get to a place where we can lock down /dev/mem and ACPI by
> spring.

As Shyam mentioned we don't have control for limits by the PMF driver 
for this on PMF v2 (Strix) or later platforms.

So if we were to revive this custom discussion it would only be for 
Phoenix and Hawk Point platforms.

> 
> Moreover, since the other two proposed drivers use the
> firmware_attributes API, should this be used here as well?

I do feel that if we revive this conversation specifically for Phoenix 
and Hawk Point platforms yes we should use the same API to expose it to 
userspace as those other two drivers do.

I'd like Shyam's temperature on this idea though before anyone spends 
time on it.  If he's amenable would you want to work on it?

> 
> By the way, you were right about needing a taint for this. Strix Point
> fails to enter a lower power state during sleep if you set it to lower
> than 10W. This is not ideal, as hawk point could go down to 5 while
> still showing a power difference, but I am unsure where this bug
> should be reported. This is both through ryzenadj/ALIB

Who is to say this is a bug?  Abusing a debugging interface with a 
reverse engineered tool means you might be able to configure a platform 
out of specifications.

> 
> Antheas
> 
> On Thu, 26 Sept 2024 at 05:00, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> From: Mario Limonciello <mario.limonciello@amd.com>
>>
>> A number of users resort to using reverse engineered software like
>> ryzenadj to manipulate debugging interfaces for modifying APU settings.
>>
>> At a glance these tools are useful, but the problem is they break
>> state machines in other software such as the PMF driver or the OEM
>> EC.
>>
>> Offer a knob for PMF to allow 'manual control' which will users can
>> directly change things like fPPT and sPPT. As this can be harmful for
>> a system to try to push limits outside of a thermal design, taint the
>> kernel and show a critical message when in use.
>>
>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>> ---
>>   Documentation/ABI/testing/sysfs-amd-pmf | 10 +++
>>   drivers/platform/x86/amd/pmf/Makefile   |  1 +
>>   drivers/platform/x86/amd/pmf/core.c     |  9 +++
>>   drivers/platform/x86/amd/pmf/manual.c   | 88 +++++++++++++++++++++++++
>>   drivers/platform/x86/amd/pmf/pmf.h      |  5 ++
>>   drivers/platform/x86/amd/pmf/sps.c      |  4 ++
>>   6 files changed, 117 insertions(+)
>>   create mode 100644 drivers/platform/x86/amd/pmf/manual.c
>>
>> diff --git a/Documentation/ABI/testing/sysfs-amd-pmf b/Documentation/ABI/testing/sysfs-amd-pmf
>> index 7fc0e1c2b76b..6f3d5cbf443f 100644
>> --- a/Documentation/ABI/testing/sysfs-amd-pmf
>> +++ b/Documentation/ABI/testing/sysfs-amd-pmf
>> @@ -11,3 +11,13 @@ Description: Reading this file tells if the AMD Platform Management(PMF)
>>                  To turn off CnQF user can write "off" to the sysfs node.
>>                  Note: Systems that support auto mode will not have this sysfs file
>>                  available.
>> +
>> +What:          /sys/devices/platform/*/{spl, fppt, sppt, sppt_apu_only, stt_min, stt_limit_apu, stt_skip_temp}
>> +Date:          December 2024
>> +Contact:       Mario Limonciello <mario.limonciello@amd.com>
>> +Description:   Manual control of AMD PMF APU coefficients
>> +               .
>> +               These files are used to manually control the APU coefficients.
>> +               In order to write to these files the module most be
>> +               loaded with manual_control=1 and the user must write "custom"
>> +               to the ACPI platform profile.
>> diff --git a/drivers/platform/x86/amd/pmf/Makefile b/drivers/platform/x86/amd/pmf/Makefile
>> index 7d6079b02589..81444d6f4428 100644
>> --- a/drivers/platform/x86/amd/pmf/Makefile
>> +++ b/drivers/platform/x86/amd/pmf/Makefile
>> @@ -7,4 +7,5 @@
>>   obj-$(CONFIG_AMD_PMF) += amd-pmf.o
>>   amd-pmf-objs := core.o acpi.o sps.o \
>>                  auto-mode.o cnqf.o \
>> +               manual.o \
>>                  tee-if.o spc.o pmf-quirks.o
>> diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
>> index d6af0ca036f1..52a68ca094be 100644
>> --- a/drivers/platform/x86/amd/pmf/core.c
>> +++ b/drivers/platform/x86/amd/pmf/core.c
>> @@ -53,6 +53,10 @@ static bool force_load;
>>   module_param(force_load, bool, 0444);
>>   MODULE_PARM_DESC(force_load, "Force load this driver on supported older platforms (experimental)");
>>
>> +bool pmf_manual_control;
>> +module_param_named(manual_control, pmf_manual_control, bool, 0444);
>> +MODULE_PARM_DESC(manual_control, "Expose manual control knobs (experimental)");
>> +
>>   static int amd_pmf_pwr_src_notify_call(struct notifier_block *nb, unsigned long event, void *data)
>>   {
>>          struct amd_pmf_dev *pmf = container_of(nb, struct amd_pmf_dev, pwr_src_notifier);
>> @@ -349,6 +353,10 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
>>                  dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
>>          }
>>
>> +       if (pmf_manual_control) {
>> +               amd_pmf_init_manual_control(dev);
>> +               return;
>> +       }
>>          amd_pmf_init_smart_pc(dev);
>>          if (dev->smart_pc_enabled) {
>>                  dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
>> @@ -485,6 +493,7 @@ static void amd_pmf_remove(struct platform_device *pdev)
>>
>>   static const struct attribute_group *amd_pmf_driver_groups[] = {
>>          &cnqf_feature_attribute_group,
>> +       &manual_attribute_group,
>>          NULL,
>>   };
>>
>> diff --git a/drivers/platform/x86/amd/pmf/manual.c b/drivers/platform/x86/amd/pmf/manual.c
>> new file mode 100644
>> index 000000000000..b33fc3cd8d61
>> --- /dev/null
>> +++ b/drivers/platform/x86/amd/pmf/manual.c
>> @@ -0,0 +1,88 @@
>> +// SPDX-License-Identifier: GPL-2.0-or-later
>> +/*
>> + * AMD Platform Management Framework Driver
>> + *
>> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
>> + * All Rights Reserved.
>> + *
>> + * Author: Mario Limonciello <mario.limonciello@amd.com>
>> + */
>> +
>> +#include "pmf.h"
>> +
>> +#define pmf_manual_attribute(_name, _set_command, _get_command)                \
>> +static ssize_t _name##_store(struct device *d,                         \
>> +                            struct device_attribute *attr,             \
>> +                            const char *buf, size_t count)             \
>> +{                                                                      \
>> +       struct amd_pmf_dev *dev = dev_get_drvdata(d);                   \
>> +       uint val;                                                       \
>> +                                                                       \
>> +       if (dev->current_profile != PLATFORM_PROFILE_CUSTOM) {          \
>> +               dev_warn_once(dev->dev,                                 \
>> +                             "Manual control is disabled, please set " \
>> +                             "platform profile to custom.\n");         \
>> +               return -EINVAL;                                         \
>> +       }                                                               \
>> +                                                                       \
>> +       if (kstrtouint(buf, 10, &val) < 0)                              \
>> +               return -EINVAL;                                         \
>> +                                                                       \
>> +       amd_pmf_send_cmd(dev, _set_command, false, val, NULL);          \
>> +                                                                       \
>> +       return count;                                                   \
>> +}                                                                      \
>> +static ssize_t _name##_show(struct device *d,                          \
>> +                          struct device_attribute *attr,               \
>> +                          char *buf)                                   \
>> +{                                                                      \
>> +       struct amd_pmf_dev *dev = dev_get_drvdata(d);                   \
>> +       uint val;                                                       \
>> +                                                                       \
>> +       amd_pmf_send_cmd(dev, _get_command, true, ARG_NONE, &val);      \
>> +                                                                       \
>> +       return sysfs_emit(buf, "%u\n", val);                            \
>> +}
>> +
>> +pmf_manual_attribute(spl, SET_SPL, GET_SPL);
>> +static DEVICE_ATTR_RW(spl);
>> +pmf_manual_attribute(fppt, SET_FPPT, GET_FPPT);
>> +static DEVICE_ATTR_RW(fppt);
>> +pmf_manual_attribute(sppt, SET_SPPT, GET_SPPT);
>> +static DEVICE_ATTR_RW(sppt);
>> +pmf_manual_attribute(sppt_apu_only, SET_SPPT_APU_ONLY, GET_SPPT_APU_ONLY);
>> +static DEVICE_ATTR_RW(sppt_apu_only);
>> +pmf_manual_attribute(stt_min, SET_STT_MIN_LIMIT, GET_STT_MIN_LIMIT);
>> +static DEVICE_ATTR_RW(stt_min);
>> +pmf_manual_attribute(stt_limit_apu, SET_STT_LIMIT_APU, GET_STT_LIMIT_APU);
>> +static DEVICE_ATTR_RW(stt_limit_apu);
>> +pmf_manual_attribute(stt_skin_temp, SET_STT_LIMIT_HS2, GET_STT_LIMIT_HS2);
>> +static DEVICE_ATTR_RW(stt_skin_temp);
>> +
>> +static umode_t manual_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
>> +{
>> +       return pmf_manual_control ? 0660 : 0;
>> +}
>> +
>> +static struct attribute *manual_attrs[] = {
>> +       &dev_attr_spl.attr,
>> +       &dev_attr_fppt.attr,
>> +       &dev_attr_sppt.attr,
>> +       &dev_attr_sppt_apu_only.attr,
>> +       &dev_attr_stt_min.attr,
>> +       &dev_attr_stt_limit_apu.attr,
>> +       &dev_attr_stt_skin_temp.attr,
>> +       NULL,
>> +};
>> +
>> +const struct attribute_group manual_attribute_group = {
>> +       .attrs = manual_attrs,
>> +       .is_visible = manual_attr_is_visible,
>> +};
>> +
>> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev)
>> +{
>> +       add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
>> +       pr_crit("Manual PMF control is enabled, please disable it before "
>> +               "reporting any bugs unrelated to PMF.\n");
>> +}
>> diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
>> index 8ce8816da9c1..ca3df63cf190 100644
>> --- a/drivers/platform/x86/amd/pmf/pmf.h
>> +++ b/drivers/platform/x86/amd/pmf/pmf.h
>> @@ -798,4 +798,9 @@ void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
>>   /* Quirk infrastructure */
>>   void amd_pmf_quirks_init(struct amd_pmf_dev *dev);
>>
>> +/* Manual configuration */
>> +extern bool pmf_manual_control;
>> +extern const struct attribute_group manual_attribute_group;
>> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev);
>> +
>>   #endif /* PMF_H */
>> diff --git a/drivers/platform/x86/amd/pmf/sps.c b/drivers/platform/x86/amd/pmf/sps.c
>> index 92f7fb22277d..6db88e523a86 100644
>> --- a/drivers/platform/x86/amd/pmf/sps.c
>> +++ b/drivers/platform/x86/amd/pmf/sps.c
>> @@ -305,6 +305,8 @@ int amd_pmf_get_pprof_modes(struct amd_pmf_dev *pmf)
>>          case PLATFORM_PROFILE_LOW_POWER:
>>                  mode = POWER_MODE_POWER_SAVER;
>>                  break;
>> +       case PLATFORM_PROFILE_CUSTOM:
>> +               return 0;
>>          default:
>>                  dev_err(pmf->dev, "Unknown Platform Profile.\n");
>>                  return -EOPNOTSUPP;
>> @@ -412,6 +414,8 @@ int amd_pmf_init_sps(struct amd_pmf_dev *dev)
>>          set_bit(PLATFORM_PROFILE_LOW_POWER, dev->pprof.choices);
>>          set_bit(PLATFORM_PROFILE_BALANCED, dev->pprof.choices);
>>          set_bit(PLATFORM_PROFILE_PERFORMANCE, dev->pprof.choices);
>> +       if (pmf_manual_control)
>> +               set_bit(PLATFORM_PROFILE_CUSTOM, dev->pprof.choices);
>>
>>          /* Create platform_profile structure and register */
>>          err = platform_profile_register(&dev->pprof);
>> --
>> 2.43.0
>>
Antheas Kapenekakis Dec. 19, 2024, 3:24 p.m. UTC | #10
On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
> > Hi Mario,
> > given that there is a Legion Go driver in the works, and Asus already
> > has a driver, the only thing that would be left for locking down ACPI
> > access is manufacturers w/o vendor APIs.
> >
> > So, can we restart the conversation about this driver? It would be
> > nice to get to a place where we can lock down /dev/mem and ACPI by
> > spring.
>
> As Shyam mentioned we don't have control for limits by the PMF driver
> for this on PMF v2 (Strix) or later platforms.
>
> So if we were to revive this custom discussion it would only be for
> Phoenix and Hawk Point platforms.

That's unfortunate.

> >
> > Moreover, since the other two proposed drivers use the
> > firmware_attributes API, should this be used here as well?
>
> I do feel that if we revive this conversation specifically for Phoenix
> and Hawk Point platforms yes we should use the same API to expose it to
> userspace as those other two drivers do.
>
> I'd like Shyam's temperature on this idea though before anyone spends
> time on it.  If he's amenable would you want to work on it?

We currently expect the 2025 lineup to include a lot of Strix Point
handhelds, so I'd like a solution that works with that. OneXPlayer
released a model already, and GPD is getting ready to ship as well.

Yeah, I could throw some hours to it after I go through some overdue stuff.

> >
> > By the way, you were right about needing a taint for this. Strix Point
> > fails to enter a lower power state during sleep if you set it to lower
> > than 10W. This is not ideal, as hawk point could go down to 5 while
> > still showing a power difference, but I am unsure where this bug
> > should be reported. This is both through ryzenadj/ALIB
>
> Who is to say this is a bug?  Abusing a debugging interface with a
> reverse engineered tool means you might be able to configure a platform
> out of specifications.

The spec being 10+W would be very undesirable for handhelds with Strix
Point, so I'd hope somebody looks into it, esp. if it can be fixed
with a BIOS fw update before more handhelds come out. I can raise the
minimum TDP to 10W, with some user complaints.

Asus and Lenovo use the same mailbox so they'd share the issue too.

FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
results in around 20-22W total consumption which is around 2.5 hours.
Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
and can go down to 8W with frame limiting etc. I do not have numbers
for Strix Point yet, but to match Hawk Point it has to allow TDP to go
down to 7W. I think for 2025, customer expectation will be 6-8 hours+
at low wattages.

Antheas

> >
> > Antheas
> >
> > On Thu, 26 Sept 2024 at 05:00, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> From: Mario Limonciello <mario.limonciello@amd.com>
> >>
> >> A number of users resort to using reverse engineered software like
> >> ryzenadj to manipulate debugging interfaces for modifying APU settings.
> >>
> >> At a glance these tools are useful, but the problem is they break
> >> state machines in other software such as the PMF driver or the OEM
> >> EC.
> >>
> >> Offer a knob for PMF to allow 'manual control' which will users can
> >> directly change things like fPPT and sPPT. As this can be harmful for
> >> a system to try to push limits outside of a thermal design, taint the
> >> kernel and show a critical message when in use.
> >>
> >> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> >> ---
> >>   Documentation/ABI/testing/sysfs-amd-pmf | 10 +++
> >>   drivers/platform/x86/amd/pmf/Makefile   |  1 +
> >>   drivers/platform/x86/amd/pmf/core.c     |  9 +++
> >>   drivers/platform/x86/amd/pmf/manual.c   | 88 +++++++++++++++++++++++++
> >>   drivers/platform/x86/amd/pmf/pmf.h      |  5 ++
> >>   drivers/platform/x86/amd/pmf/sps.c      |  4 ++
> >>   6 files changed, 117 insertions(+)
> >>   create mode 100644 drivers/platform/x86/amd/pmf/manual.c
> >>
> >> diff --git a/Documentation/ABI/testing/sysfs-amd-pmf b/Documentation/ABI/testing/sysfs-amd-pmf
> >> index 7fc0e1c2b76b..6f3d5cbf443f 100644
> >> --- a/Documentation/ABI/testing/sysfs-amd-pmf
> >> +++ b/Documentation/ABI/testing/sysfs-amd-pmf
> >> @@ -11,3 +11,13 @@ Description: Reading this file tells if the AMD Platform Management(PMF)
> >>                  To turn off CnQF user can write "off" to the sysfs node.
> >>                  Note: Systems that support auto mode will not have this sysfs file
> >>                  available.
> >> +
> >> +What:          /sys/devices/platform/*/{spl, fppt, sppt, sppt_apu_only, stt_min, stt_limit_apu, stt_skip_temp}
> >> +Date:          December 2024
> >> +Contact:       Mario Limonciello <mario.limonciello@amd.com>
> >> +Description:   Manual control of AMD PMF APU coefficients
> >> +               .
> >> +               These files are used to manually control the APU coefficients.
> >> +               In order to write to these files the module most be
> >> +               loaded with manual_control=1 and the user must write "custom"
> >> +               to the ACPI platform profile.
> >> diff --git a/drivers/platform/x86/amd/pmf/Makefile b/drivers/platform/x86/amd/pmf/Makefile
> >> index 7d6079b02589..81444d6f4428 100644
> >> --- a/drivers/platform/x86/amd/pmf/Makefile
> >> +++ b/drivers/platform/x86/amd/pmf/Makefile
> >> @@ -7,4 +7,5 @@
> >>   obj-$(CONFIG_AMD_PMF) += amd-pmf.o
> >>   amd-pmf-objs := core.o acpi.o sps.o \
> >>                  auto-mode.o cnqf.o \
> >> +               manual.o \
> >>                  tee-if.o spc.o pmf-quirks.o
> >> diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
> >> index d6af0ca036f1..52a68ca094be 100644
> >> --- a/drivers/platform/x86/amd/pmf/core.c
> >> +++ b/drivers/platform/x86/amd/pmf/core.c
> >> @@ -53,6 +53,10 @@ static bool force_load;
> >>   module_param(force_load, bool, 0444);
> >>   MODULE_PARM_DESC(force_load, "Force load this driver on supported older platforms (experimental)");
> >>
> >> +bool pmf_manual_control;
> >> +module_param_named(manual_control, pmf_manual_control, bool, 0444);
> >> +MODULE_PARM_DESC(manual_control, "Expose manual control knobs (experimental)");
> >> +
> >>   static int amd_pmf_pwr_src_notify_call(struct notifier_block *nb, unsigned long event, void *data)
> >>   {
> >>          struct amd_pmf_dev *pmf = container_of(nb, struct amd_pmf_dev, pwr_src_notifier);
> >> @@ -349,6 +353,10 @@ static void amd_pmf_init_features(struct amd_pmf_dev *dev)
> >>                  dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
> >>          }
> >>
> >> +       if (pmf_manual_control) {
> >> +               amd_pmf_init_manual_control(dev);
> >> +               return;
> >> +       }
> >>          amd_pmf_init_smart_pc(dev);
> >>          if (dev->smart_pc_enabled) {
> >>                  dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
> >> @@ -485,6 +493,7 @@ static void amd_pmf_remove(struct platform_device *pdev)
> >>
> >>   static const struct attribute_group *amd_pmf_driver_groups[] = {
> >>          &cnqf_feature_attribute_group,
> >> +       &manual_attribute_group,
> >>          NULL,
> >>   };
> >>
> >> diff --git a/drivers/platform/x86/amd/pmf/manual.c b/drivers/platform/x86/amd/pmf/manual.c
> >> new file mode 100644
> >> index 000000000000..b33fc3cd8d61
> >> --- /dev/null
> >> +++ b/drivers/platform/x86/amd/pmf/manual.c
> >> @@ -0,0 +1,88 @@
> >> +// SPDX-License-Identifier: GPL-2.0-or-later
> >> +/*
> >> + * AMD Platform Management Framework Driver
> >> + *
> >> + * Copyright (c) 2024, Advanced Micro Devices, Inc.
> >> + * All Rights Reserved.
> >> + *
> >> + * Author: Mario Limonciello <mario.limonciello@amd.com>
> >> + */
> >> +
> >> +#include "pmf.h"
> >> +
> >> +#define pmf_manual_attribute(_name, _set_command, _get_command)                \
> >> +static ssize_t _name##_store(struct device *d,                         \
> >> +                            struct device_attribute *attr,             \
> >> +                            const char *buf, size_t count)             \
> >> +{                                                                      \
> >> +       struct amd_pmf_dev *dev = dev_get_drvdata(d);                   \
> >> +       uint val;                                                       \
> >> +                                                                       \
> >> +       if (dev->current_profile != PLATFORM_PROFILE_CUSTOM) {          \
> >> +               dev_warn_once(dev->dev,                                 \
> >> +                             "Manual control is disabled, please set " \
> >> +                             "platform profile to custom.\n");         \
> >> +               return -EINVAL;                                         \
> >> +       }                                                               \
> >> +                                                                       \
> >> +       if (kstrtouint(buf, 10, &val) < 0)                              \
> >> +               return -EINVAL;                                         \
> >> +                                                                       \
> >> +       amd_pmf_send_cmd(dev, _set_command, false, val, NULL);          \
> >> +                                                                       \
> >> +       return count;                                                   \
> >> +}                                                                      \
> >> +static ssize_t _name##_show(struct device *d,                          \
> >> +                          struct device_attribute *attr,               \
> >> +                          char *buf)                                   \
> >> +{                                                                      \
> >> +       struct amd_pmf_dev *dev = dev_get_drvdata(d);                   \
> >> +       uint val;                                                       \
> >> +                                                                       \
> >> +       amd_pmf_send_cmd(dev, _get_command, true, ARG_NONE, &val);      \
> >> +                                                                       \
> >> +       return sysfs_emit(buf, "%u\n", val);                            \
> >> +}
> >> +
> >> +pmf_manual_attribute(spl, SET_SPL, GET_SPL);
> >> +static DEVICE_ATTR_RW(spl);
> >> +pmf_manual_attribute(fppt, SET_FPPT, GET_FPPT);
> >> +static DEVICE_ATTR_RW(fppt);
> >> +pmf_manual_attribute(sppt, SET_SPPT, GET_SPPT);
> >> +static DEVICE_ATTR_RW(sppt);
> >> +pmf_manual_attribute(sppt_apu_only, SET_SPPT_APU_ONLY, GET_SPPT_APU_ONLY);
> >> +static DEVICE_ATTR_RW(sppt_apu_only);
> >> +pmf_manual_attribute(stt_min, SET_STT_MIN_LIMIT, GET_STT_MIN_LIMIT);
> >> +static DEVICE_ATTR_RW(stt_min);
> >> +pmf_manual_attribute(stt_limit_apu, SET_STT_LIMIT_APU, GET_STT_LIMIT_APU);
> >> +static DEVICE_ATTR_RW(stt_limit_apu);
> >> +pmf_manual_attribute(stt_skin_temp, SET_STT_LIMIT_HS2, GET_STT_LIMIT_HS2);
> >> +static DEVICE_ATTR_RW(stt_skin_temp);
> >> +
> >> +static umode_t manual_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
> >> +{
> >> +       return pmf_manual_control ? 0660 : 0;
> >> +}
> >> +
> >> +static struct attribute *manual_attrs[] = {
> >> +       &dev_attr_spl.attr,
> >> +       &dev_attr_fppt.attr,
> >> +       &dev_attr_sppt.attr,
> >> +       &dev_attr_sppt_apu_only.attr,
> >> +       &dev_attr_stt_min.attr,
> >> +       &dev_attr_stt_limit_apu.attr,
> >> +       &dev_attr_stt_skin_temp.attr,
> >> +       NULL,
> >> +};
> >> +
> >> +const struct attribute_group manual_attribute_group = {
> >> +       .attrs = manual_attrs,
> >> +       .is_visible = manual_attr_is_visible,
> >> +};
> >> +
> >> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev)
> >> +{
> >> +       add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
> >> +       pr_crit("Manual PMF control is enabled, please disable it before "
> >> +               "reporting any bugs unrelated to PMF.\n");
> >> +}
> >> diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
> >> index 8ce8816da9c1..ca3df63cf190 100644
> >> --- a/drivers/platform/x86/amd/pmf/pmf.h
> >> +++ b/drivers/platform/x86/amd/pmf/pmf.h
> >> @@ -798,4 +798,9 @@ void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
> >>   /* Quirk infrastructure */
> >>   void amd_pmf_quirks_init(struct amd_pmf_dev *dev);
> >>
> >> +/* Manual configuration */
> >> +extern bool pmf_manual_control;
> >> +extern const struct attribute_group manual_attribute_group;
> >> +void amd_pmf_init_manual_control(struct amd_pmf_dev *dev);
> >> +
> >>   #endif /* PMF_H */
> >> diff --git a/drivers/platform/x86/amd/pmf/sps.c b/drivers/platform/x86/amd/pmf/sps.c
> >> index 92f7fb22277d..6db88e523a86 100644
> >> --- a/drivers/platform/x86/amd/pmf/sps.c
> >> +++ b/drivers/platform/x86/amd/pmf/sps.c
> >> @@ -305,6 +305,8 @@ int amd_pmf_get_pprof_modes(struct amd_pmf_dev *pmf)
> >>          case PLATFORM_PROFILE_LOW_POWER:
> >>                  mode = POWER_MODE_POWER_SAVER;
> >>                  break;
> >> +       case PLATFORM_PROFILE_CUSTOM:
> >> +               return 0;
> >>          default:
> >>                  dev_err(pmf->dev, "Unknown Platform Profile.\n");
> >>                  return -EOPNOTSUPP;
> >> @@ -412,6 +414,8 @@ int amd_pmf_init_sps(struct amd_pmf_dev *dev)
> >>          set_bit(PLATFORM_PROFILE_LOW_POWER, dev->pprof.choices);
> >>          set_bit(PLATFORM_PROFILE_BALANCED, dev->pprof.choices);
> >>          set_bit(PLATFORM_PROFILE_PERFORMANCE, dev->pprof.choices);
> >> +       if (pmf_manual_control)
> >> +               set_bit(PLATFORM_PROFILE_CUSTOM, dev->pprof.choices);
> >>
> >>          /* Create platform_profile structure and register */
> >>          err = platform_profile_register(&dev->pprof);
> >> --
> >> 2.43.0
> >>
>
Mario Limonciello Dec. 19, 2024, 4:13 p.m. UTC | #11
On 12/19/2024 09:24, Antheas Kapenekakis wrote:
> On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
>>> Hi Mario,
>>> given that there is a Legion Go driver in the works, and Asus already
>>> has a driver, the only thing that would be left for locking down ACPI
>>> access is manufacturers w/o vendor APIs.
>>>
>>> So, can we restart the conversation about this driver? It would be
>>> nice to get to a place where we can lock down /dev/mem and ACPI by
>>> spring.
>>
>> As Shyam mentioned we don't have control for limits by the PMF driver
>> for this on PMF v2 (Strix) or later platforms.
>>
>> So if we were to revive this custom discussion it would only be for
>> Phoenix and Hawk Point platforms.
> 
> That's unfortunate.
> 
>>>
>>> Moreover, since the other two proposed drivers use the
>>> firmware_attributes API, should this be used here as well?
>>
>> I do feel that if we revive this conversation specifically for Phoenix
>> and Hawk Point platforms yes we should use the same API to expose it to
>> userspace as those other two drivers do.
>>
>> I'd like Shyam's temperature on this idea though before anyone spends
>> time on it.  If he's amenable would you want to work on it?
> 
> We currently expect the 2025 lineup to include a lot of Strix Point
> handhelds, so I'd like a solution that works with that. OneXPlayer
> released a model already, and GPD is getting ready to ship as well.
> 
> Yeah, I could throw some hours to it after I go through some overdue stuff.
> 
>>>
>>> By the way, you were right about needing a taint for this. Strix Point
>>> fails to enter a lower power state during sleep if you set it to lower
>>> than 10W. This is not ideal, as hawk point could go down to 5 while
>>> still showing a power difference, but I am unsure where this bug
>>> should be reported. This is both through ryzenadj/ALIB
>>
>> Who is to say this is a bug?  Abusing a debugging interface with a
>> reverse engineered tool means you might be able to configure a platform
>> out of specifications.
> 
> The spec being 10+W would be very undesirable for handhelds with Strix
> Point, so I'd hope somebody looks into it, esp. if it can be fixed
> with a BIOS fw update before more handhelds come out. I can raise the
> minimum TDP to 10W, with some user complaints.
> 
> Asus and Lenovo use the same mailbox so they'd share the issue too.
> 
> FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
> results in around 20-22W total consumption which is around 2.5 hours.
> Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
> and can go down to 8W with frame limiting etc. I do not have numbers
> for Strix Point yet, but to match Hawk Point it has to allow TDP to go
> down to 7W. I think for 2025, customer expectation will be 6-8 hours+
> at low wattages.
> 

I've got a fundamental question - why the fixation on PPT?

This just sets "limits" for the package.  In Windows it's probably the 
best knob to tune to adjust performance in an effort to extend battery 
life, but in Linux we have a lot of other knobs:

* the ability to tune EPP (energy_performance_preference)
* set min and max CPU frequencies (scaling_min_freq, scaling_max_freq)
* offline cores at will
* change DPM setting in the GPU driver (power_dpm_force_performance_level)

All the core related knobs can be changed on a per-core basis.  So for 
example even on a non-heterogeneous design you could potentially make it 
perform "like" a hetero design where you set it so that some cores don't 
go above nominal frequency or the EPP value is tuned less aggressively 
on some cores.

These knobs can have just as drastic of a result on battery life as 
adjusting the various power limiting knobs.  Most importantly these 
knobs have architectural limits that you won't be able to override so 
you can safely change them to min/max and see what happens.

I feel like specifically if you keep EPP at balance_performance, keep 
scaling_min_freq at lowest non linear frequency and change 
scaling_max_freq on a few of the cores you should be able to influence 
the battery life quite a bit while still keeping the system responsive.
Antheas Kapenekakis Dec. 19, 2024, 9:10 p.m. UTC | #12
On Thu, 19 Dec 2024 at 17:14, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 12/19/2024 09:24, Antheas Kapenekakis wrote:
> > On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
> >>> Hi Mario,
> >>> given that there is a Legion Go driver in the works, and Asus already
> >>> has a driver, the only thing that would be left for locking down ACPI
> >>> access is manufacturers w/o vendor APIs.
> >>>
> >>> So, can we restart the conversation about this driver? It would be
> >>> nice to get to a place where we can lock down /dev/mem and ACPI by
> >>> spring.
> >>
> >> As Shyam mentioned we don't have control for limits by the PMF driver
> >> for this on PMF v2 (Strix) or later platforms.
> >>
> >> So if we were to revive this custom discussion it would only be for
> >> Phoenix and Hawk Point platforms.
> >
> > That's unfortunate.
> >
> >>>
> >>> Moreover, since the other two proposed drivers use the
> >>> firmware_attributes API, should this be used here as well?
> >>
> >> I do feel that if we revive this conversation specifically for Phoenix
> >> and Hawk Point platforms yes we should use the same API to expose it to
> >> userspace as those other two drivers do.
> >>
> >> I'd like Shyam's temperature on this idea though before anyone spends
> >> time on it.  If he's amenable would you want to work on it?
> >
> > We currently expect the 2025 lineup to include a lot of Strix Point
> > handhelds, so I'd like a solution that works with that. OneXPlayer
> > released a model already, and GPD is getting ready to ship as well.
> >
> > Yeah, I could throw some hours to it after I go through some overdue stuff.
> >
> >>>
> >>> By the way, you were right about needing a taint for this. Strix Point
> >>> fails to enter a lower power state during sleep if you set it to lower
> >>> than 10W. This is not ideal, as hawk point could go down to 5 while
> >>> still showing a power difference, but I am unsure where this bug
> >>> should be reported. This is both through ryzenadj/ALIB
> >>
> >> Who is to say this is a bug?  Abusing a debugging interface with a
> >> reverse engineered tool means you might be able to configure a platform
> >> out of specifications.
> >
> > The spec being 10+W would be very undesirable for handhelds with Strix
> > Point, so I'd hope somebody looks into it, esp. if it can be fixed
> > with a BIOS fw update before more handhelds come out. I can raise the
> > minimum TDP to 10W, with some user complaints.
> >
> > Asus and Lenovo use the same mailbox so they'd share the issue too.
> >
> > FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
> > results in around 20-22W total consumption which is around 2.5 hours.
> > Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
> > and can go down to 8W with frame limiting etc. I do not have numbers
> > for Strix Point yet, but to match Hawk Point it has to allow TDP to go
> > down to 7W. I think for 2025, customer expectation will be 6-8 hours+
> > at low wattages.
> >
>
> I've got a fundamental question - why the fixation on PPT?
>
> This just sets "limits" for the package.  In Windows it's probably the
> best knob to tune to adjust performance in an effort to extend battery
> life, but in Linux we have a lot of other knobs:
>
> * the ability to tune EPP (energy_performance_preference)
> * set min and max CPU frequencies (scaling_min_freq, scaling_max_freq)

We use both of these.

> * offline cores at will

if a core is parked and you try to write into its sysfs entrypoints,
we found that this might cause a userspace program to hang
indefinitely. Since a lot of settings are per core that's problematic
and since it does not help much most TDP programs dont offer it
anymore.

> * change DPM setting in the GPU driver (power_dpm_force_performance_level)

I think we played with this mostly to try to get lower than 800mhz.
However, going lower than 800mhz in these APUs causes issues.

> All the core related knobs can be changed on a per-core basis.  So for
> example even on a non-heterogeneous design you could potentially make it
> perform "like" a hetero design where you set it so that some cores don't
> go above nominal frequency or the EPP value is tuned less aggressively
> on some cores.

> These knobs can have just as drastic of a result on battery life as
> adjusting the various power limiting knobs.  Most importantly these
> knobs have architectural limits that you won't be able to override so
> you can safely change them to min/max and see what happens.

I feel like we are discussing different targets here. When it comes to
computing tasks, you have a certain block of work that needs to be
done and after that the CPU is free. In this case, programs like tuned
(allegedly) optimize these settings so that they take the minimum
amount of power to complete that block of work.

However, games are different. Games have no problem burning power if
you let them and they are also playable at a variety of power levels.
Typically, unless the user caps the framerate and video quality of the
game it will use the full slow temp limit value. Even if they do set
that, the game will typically burn 3-4W more than what is needed
depending on TDP, EPP etc.

Therefore, the question we ask users is how loud do you want your
device to be and how long (in hours) do you want the battery to last.
This is done by ppt + the other settings, which are set automatically
based on ppt.

Then the users can compromise with what fidelity and fps they get
based on their TDP.

> I feel like specifically if you keep EPP at balance_performance, keep
> scaling_min_freq at lowest non linear frequency and change
> scaling_max_freq on a few of the cores you should be able to influence
> the battery life quite a bit while still keeping the system responsive.

The sweet spot on these APUs is max freq to be nominal (i.e., no
boost), EPP at balance_power or power for very low tdps, and min_freq
to be 0. Especially for min_freq, setting it to lowest non-linear
seems to have no effect.

Antheas
Mario Limonciello Dec. 19, 2024, 9:21 p.m. UTC | #13
On 12/19/2024 15:10, Antheas Kapenekakis wrote:
> On Thu, 19 Dec 2024 at 17:14, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 12/19/2024 09:24, Antheas Kapenekakis wrote:
>>> On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
>>>>> Hi Mario,
>>>>> given that there is a Legion Go driver in the works, and Asus already
>>>>> has a driver, the only thing that would be left for locking down ACPI
>>>>> access is manufacturers w/o vendor APIs.
>>>>>
>>>>> So, can we restart the conversation about this driver? It would be
>>>>> nice to get to a place where we can lock down /dev/mem and ACPI by
>>>>> spring.
>>>>
>>>> As Shyam mentioned we don't have control for limits by the PMF driver
>>>> for this on PMF v2 (Strix) or later platforms.
>>>>
>>>> So if we were to revive this custom discussion it would only be for
>>>> Phoenix and Hawk Point platforms.
>>>
>>> That's unfortunate.
>>>
>>>>>
>>>>> Moreover, since the other two proposed drivers use the
>>>>> firmware_attributes API, should this be used here as well?
>>>>
>>>> I do feel that if we revive this conversation specifically for Phoenix
>>>> and Hawk Point platforms yes we should use the same API to expose it to
>>>> userspace as those other two drivers do.
>>>>
>>>> I'd like Shyam's temperature on this idea though before anyone spends
>>>> time on it.  If he's amenable would you want to work on it?
>>>
>>> We currently expect the 2025 lineup to include a lot of Strix Point
>>> handhelds, so I'd like a solution that works with that. OneXPlayer
>>> released a model already, and GPD is getting ready to ship as well.
>>>
>>> Yeah, I could throw some hours to it after I go through some overdue stuff.
>>>
>>>>>
>>>>> By the way, you were right about needing a taint for this. Strix Point
>>>>> fails to enter a lower power state during sleep if you set it to lower
>>>>> than 10W. This is not ideal, as hawk point could go down to 5 while
>>>>> still showing a power difference, but I am unsure where this bug
>>>>> should be reported. This is both through ryzenadj/ALIB
>>>>
>>>> Who is to say this is a bug?  Abusing a debugging interface with a
>>>> reverse engineered tool means you might be able to configure a platform
>>>> out of specifications.
>>>
>>> The spec being 10+W would be very undesirable for handhelds with Strix
>>> Point, so I'd hope somebody looks into it, esp. if it can be fixed
>>> with a BIOS fw update before more handhelds come out. I can raise the
>>> minimum TDP to 10W, with some user complaints.
>>>
>>> Asus and Lenovo use the same mailbox so they'd share the issue too.
>>>
>>> FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
>>> results in around 20-22W total consumption which is around 2.5 hours.
>>> Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
>>> and can go down to 8W with frame limiting etc. I do not have numbers
>>> for Strix Point yet, but to match Hawk Point it has to allow TDP to go
>>> down to 7W. I think for 2025, customer expectation will be 6-8 hours+
>>> at low wattages.
>>>
>>
>> I've got a fundamental question - why the fixation on PPT?
>>
>> This just sets "limits" for the package.  In Windows it's probably the
>> best knob to tune to adjust performance in an effort to extend battery
>> life, but in Linux we have a lot of other knobs:
>>
>> * the ability to tune EPP (energy_performance_preference)
>> * set min and max CPU frequencies (scaling_min_freq, scaling_max_freq)
> 
> We use both of these.
> 
>> * offline cores at will
> 
> if a core is parked and you try to write into its sysfs entrypoints,
> we found that this might cause a userspace program to hang
> indefinitely. Since a lot of settings are per core that's problematic
> and since it does not help much most TDP programs dont offer it
> anymore.

This sounds like a kernel bug if you're hanging programs when trying to 
write to sysfs files of offlined cores.  If we can get that fixed having 
that in your toolbelt is quite useful.  I'm sure there are plenty of 
games that don't really need all the cores up and you can save some power.

Can you get a simple reproducer for me into a bug report to look at next 
year?

> 
>> * change DPM setting in the GPU driver (power_dpm_force_performance_level)
> 
> I think we played with this mostly to try to get lower than 800mhz.
> However, going lower than 800mhz in these APUs causes issues.
> 
>> All the core related knobs can be changed on a per-core basis.  So for
>> example even on a non-heterogeneous design you could potentially make it
>> perform "like" a hetero design where you set it so that some cores don't
>> go above nominal frequency or the EPP value is tuned less aggressively
>> on some cores.
> 
>> These knobs can have just as drastic of a result on battery life as
>> adjusting the various power limiting knobs.  Most importantly these
>> knobs have architectural limits that you won't be able to override so
>> you can safely change them to min/max and see what happens.
> 
> I feel like we are discussing different targets here. When it comes to
> computing tasks, you have a certain block of work that needs to be
> done and after that the CPU is free. In this case, programs like tuned
> (allegedly) optimize these settings so that they take the minimum
> amount of power to complete that block of work.
> 
> However, games are different. Games have no problem burning power if
> you let them and they are also playable at a variety of power levels.
> Typically, unless the user caps the framerate and video quality of the
> game it will use the full slow temp limit value. Even if they do set
> that, the game will typically burn 3-4W more than what is needed
> depending on TDP, EPP etc.

Part of what I'm wondering is if our 4 levels of EPP values "aren't 
enough" for optimization on a per game basis.

IMO They're incredibly rigid.  I do have a patch that can expose "raw" 
numbers for amd-pstate like intel-pstate does, but I haven't brought it 
on the lists yet because I'm still discussing it with others internal to 
AMD.

EPP is really about responsiveness in games.

> 
> Therefore, the question we ask users is how loud do you want your
> device to be and how long (in hours) do you want the battery to last.
> This is done by ppt + the other settings, which are set automatically
> based on ppt.
> 
> Then the users can compromise with what fidelity and fps they get
> based on their TDP.
> 
>> I feel like specifically if you keep EPP at balance_performance, keep
>> scaling_min_freq at lowest non linear frequency and change
>> scaling_max_freq on a few of the cores you should be able to influence
>> the battery life quite a bit while still keeping the system responsive.
> 
> The sweet spot on these APUs is max freq to be nominal (i.e., no
> boost), EPP at balance_power or power for very low tdps, and min_freq
> to be 0. Especially for min_freq, setting it to lowest non-linear
> seems to have no effect.

Min freq can't go to 0, but maybe you mean the unitless perf value of 0, 
right?

There is a pretty big current swing you'll have going from perf 0 vs the 
swing you get from lowest nonlinear perf.  It might not be visually 
noticeable, but I think it would be good to characterize how many joules 
are used for a given predictable gaming "workload" to decide what to do.
Antheas Kapenekakis Dec. 19, 2024, 9:27 p.m. UTC | #14
On Thu, 19 Dec 2024 at 22:21, Mario Limonciello <superm1@kernel.org> wrote:
>
> On 12/19/2024 15:10, Antheas Kapenekakis wrote:
> > On Thu, 19 Dec 2024 at 17:14, Mario Limonciello <superm1@kernel.org> wrote:
> >>
> >> On 12/19/2024 09:24, Antheas Kapenekakis wrote:
> >>> On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@kernel.org> wrote:
> >>>>
> >>>> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
> >>>>> Hi Mario,
> >>>>> given that there is a Legion Go driver in the works, and Asus already
> >>>>> has a driver, the only thing that would be left for locking down ACPI
> >>>>> access is manufacturers w/o vendor APIs.
> >>>>>
> >>>>> So, can we restart the conversation about this driver? It would be
> >>>>> nice to get to a place where we can lock down /dev/mem and ACPI by
> >>>>> spring.
> >>>>
> >>>> As Shyam mentioned we don't have control for limits by the PMF driver
> >>>> for this on PMF v2 (Strix) or later platforms.
> >>>>
> >>>> So if we were to revive this custom discussion it would only be for
> >>>> Phoenix and Hawk Point platforms.
> >>>
> >>> That's unfortunate.
> >>>
> >>>>>
> >>>>> Moreover, since the other two proposed drivers use the
> >>>>> firmware_attributes API, should this be used here as well?
> >>>>
> >>>> I do feel that if we revive this conversation specifically for Phoenix
> >>>> and Hawk Point platforms yes we should use the same API to expose it to
> >>>> userspace as those other two drivers do.
> >>>>
> >>>> I'd like Shyam's temperature on this idea though before anyone spends
> >>>> time on it.  If he's amenable would you want to work on it?
> >>>
> >>> We currently expect the 2025 lineup to include a lot of Strix Point
> >>> handhelds, so I'd like a solution that works with that. OneXPlayer
> >>> released a model already, and GPD is getting ready to ship as well.
> >>>
> >>> Yeah, I could throw some hours to it after I go through some overdue stuff.
> >>>
> >>>>>
> >>>>> By the way, you were right about needing a taint for this. Strix Point
> >>>>> fails to enter a lower power state during sleep if you set it to lower
> >>>>> than 10W. This is not ideal, as hawk point could go down to 5 while
> >>>>> still showing a power difference, but I am unsure where this bug
> >>>>> should be reported. This is both through ryzenadj/ALIB
> >>>>
> >>>> Who is to say this is a bug?  Abusing a debugging interface with a
> >>>> reverse engineered tool means you might be able to configure a platform
> >>>> out of specifications.
> >>>
> >>> The spec being 10+W would be very undesirable for handhelds with Strix
> >>> Point, so I'd hope somebody looks into it, esp. if it can be fixed
> >>> with a BIOS fw update before more handhelds come out. I can raise the
> >>> minimum TDP to 10W, with some user complaints.
> >>>
> >>> Asus and Lenovo use the same mailbox so they'd share the issue too.
> >>>
> >>> FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
> >>> results in around 20-22W total consumption which is around 2.5 hours.
> >>> Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
> >>> and can go down to 8W with frame limiting etc. I do not have numbers
> >>> for Strix Point yet, but to match Hawk Point it has to allow TDP to go
> >>> down to 7W. I think for 2025, customer expectation will be 6-8 hours+
> >>> at low wattages.
> >>>
> >>
> >> I've got a fundamental question - why the fixation on PPT?
> >>
> >> This just sets "limits" for the package.  In Windows it's probably the
> >> best knob to tune to adjust performance in an effort to extend battery
> >> life, but in Linux we have a lot of other knobs:
> >>
> >> * the ability to tune EPP (energy_performance_preference)
> >> * set min and max CPU frequencies (scaling_min_freq, scaling_max_freq)
> >
> > We use both of these.
> >
> >> * offline cores at will
> >
> > if a core is parked and you try to write into its sysfs entrypoints,
> > we found that this might cause a userspace program to hang
> > indefinitely. Since a lot of settings are per core that's problematic
> > and since it does not help much most TDP programs dont offer it
> > anymore.
>
> This sounds like a kernel bug if you're hanging programs when trying to
> write to sysfs files of offlined cores.  If we can get that fixed having
> that in your toolbelt is quite useful.  I'm sure there are plenty of
> games that don't really need all the cores up and you can save some power.
>
> Can you get a simple reproducer for me into a bug report to look at next
> year?

I will try to. This was relayed to me. Disabling SMT also causes a
crash on the Ally when going to sleep.

> >
> >> * change DPM setting in the GPU driver (power_dpm_force_performance_level)
> >
> > I think we played with this mostly to try to get lower than 800mhz.
> > However, going lower than 800mhz in these APUs causes issues.
> >
> >> All the core related knobs can be changed on a per-core basis.  So for
> >> example even on a non-heterogeneous design you could potentially make it
> >> perform "like" a hetero design where you set it so that some cores don't
> >> go above nominal frequency or the EPP value is tuned less aggressively
> >> on some cores.
> >
> >> These knobs can have just as drastic of a result on battery life as
> >> adjusting the various power limiting knobs.  Most importantly these
> >> knobs have architectural limits that you won't be able to override so
> >> you can safely change them to min/max and see what happens.
> >
> > I feel like we are discussing different targets here. When it comes to
> > computing tasks, you have a certain block of work that needs to be
> > done and after that the CPU is free. In this case, programs like tuned
> > (allegedly) optimize these settings so that they take the minimum
> > amount of power to complete that block of work.
> >
> > However, games are different. Games have no problem burning power if
> > you let them and they are also playable at a variety of power levels.
> > Typically, unless the user caps the framerate and video quality of the
> > game it will use the full slow temp limit value. Even if they do set
> > that, the game will typically burn 3-4W more than what is needed
> > depending on TDP, EPP etc.
>
> Part of what I'm wondering is if our 4 levels of EPP values "aren't
> enough" for optimization on a per game basis.
>
> IMO They're incredibly rigid.  I do have a patch that can expose "raw"
> numbers for amd-pstate like intel-pstate does, but I haven't brought it
> on the lists yet because I'm still discussing it with others internal to
> AMD.
>
> EPP is really about responsiveness in games.

EPP performance is so detrimental we hide it. It destroys performance
by sucking power from the GPU. EPP balance_performance is only useful
in certain emulators that need a lot of CPU. Only balance_power is
useful. Then, for TDPs lower than 10, setting EPP to power milks
another 1-2W

> >
> > Therefore, the question we ask users is how loud do you want your
> > device to be and how long (in hours) do you want the battery to last.
> > This is done by ppt + the other settings, which are set automatically
> > based on ppt.
> >
> > Then the users can compromise with what fidelity and fps they get
> > based on their TDP.
> >
> >> I feel like specifically if you keep EPP at balance_performance, keep
> >> scaling_min_freq at lowest non linear frequency and change
> >> scaling_max_freq on a few of the cores you should be able to influence
> >> the battery life quite a bit while still keeping the system responsive.
> >
> > The sweet spot on these APUs is max freq to be nominal (i.e., no
> > boost), EPP at balance_power or power for very low tdps, and min_freq
> > to be 0. Especially for min_freq, setting it to lowest non-linear
> > seems to have no effect.
>
> Min freq can't go to 0, but maybe you mean the unitless perf value of 0,
> right?
>
> There is a pretty big current swing you'll have going from perf 0 vs the
> swing you get from lowest nonlinear perf.  It might not be visually
> noticeable, but I think it would be good to characterize how many joules
> are used for a given predictable gaming "workload" to decide what to do.

I set min_freq for the CPUs to the minimum one which maybe you are
right it is 400mhz. When it comes to games, 1.4GHz (nonlinear) vs
400mhz does not make much of a difference.

Antheas
Mario Limonciello Dec. 19, 2024, 9:35 p.m. UTC | #15
On 12/19/2024 15:27, Antheas Kapenekakis wrote:
> On Thu, 19 Dec 2024 at 22:21, Mario Limonciello <superm1@kernel.org> wrote:
>>
>> On 12/19/2024 15:10, Antheas Kapenekakis wrote:
>>> On Thu, 19 Dec 2024 at 17:14, Mario Limonciello <superm1@kernel.org> wrote:
>>>>
>>>> On 12/19/2024 09:24, Antheas Kapenekakis wrote:
>>>>> On Thu, 19 Dec 2024 at 15:50, Mario Limonciello <superm1@kernel.org> wrote:
>>>>>>
>>>>>> On 12/19/2024 07:12, Antheas Kapenekakis wrote:
>>>>>>> Hi Mario,
>>>>>>> given that there is a Legion Go driver in the works, and Asus already
>>>>>>> has a driver, the only thing that would be left for locking down ACPI
>>>>>>> access is manufacturers w/o vendor APIs.
>>>>>>>
>>>>>>> So, can we restart the conversation about this driver? It would be
>>>>>>> nice to get to a place where we can lock down /dev/mem and ACPI by
>>>>>>> spring.
>>>>>>
>>>>>> As Shyam mentioned we don't have control for limits by the PMF driver
>>>>>> for this on PMF v2 (Strix) or later platforms.
>>>>>>
>>>>>> So if we were to revive this custom discussion it would only be for
>>>>>> Phoenix and Hawk Point platforms.
>>>>>
>>>>> That's unfortunate.
>>>>>
>>>>>>>
>>>>>>> Moreover, since the other two proposed drivers use the
>>>>>>> firmware_attributes API, should this be used here as well?
>>>>>>
>>>>>> I do feel that if we revive this conversation specifically for Phoenix
>>>>>> and Hawk Point platforms yes we should use the same API to expose it to
>>>>>> userspace as those other two drivers do.
>>>>>>
>>>>>> I'd like Shyam's temperature on this idea though before anyone spends
>>>>>> time on it.  If he's amenable would you want to work on it?
>>>>>
>>>>> We currently expect the 2025 lineup to include a lot of Strix Point
>>>>> handhelds, so I'd like a solution that works with that. OneXPlayer
>>>>> released a model already, and GPD is getting ready to ship as well.
>>>>>
>>>>> Yeah, I could throw some hours to it after I go through some overdue stuff.
>>>>>
>>>>>>>
>>>>>>> By the way, you were right about needing a taint for this. Strix Point
>>>>>>> fails to enter a lower power state during sleep if you set it to lower
>>>>>>> than 10W. This is not ideal, as hawk point could go down to 5 while
>>>>>>> still showing a power difference, but I am unsure where this bug
>>>>>>> should be reported. This is both through ryzenadj/ALIB
>>>>>>
>>>>>> Who is to say this is a bug?  Abusing a debugging interface with a
>>>>>> reverse engineered tool means you might be able to configure a platform
>>>>>> out of specifications.
>>>>>
>>>>> The spec being 10+W would be very undesirable for handhelds with Strix
>>>>> Point, so I'd hope somebody looks into it, esp. if it can be fixed
>>>>> with a BIOS fw update before more handhelds come out. I can raise the
>>>>> minimum TDP to 10W, with some user complaints.
>>>>>
>>>>> Asus and Lenovo use the same mailbox so they'd share the issue too.
>>>>>
>>>>> FYI for a typical handheld with e.g., a 60Wh battery, a 10W envelope
>>>>> results in around 20-22W total consumption which is around 2.5 hours.
>>>>> Hawk Point can be TDP limited down to 16W total consumption (TDP ~7W)
>>>>> and can go down to 8W with frame limiting etc. I do not have numbers
>>>>> for Strix Point yet, but to match Hawk Point it has to allow TDP to go
>>>>> down to 7W. I think for 2025, customer expectation will be 6-8 hours+
>>>>> at low wattages.
>>>>>
>>>>
>>>> I've got a fundamental question - why the fixation on PPT?
>>>>
>>>> This just sets "limits" for the package.  In Windows it's probably the
>>>> best knob to tune to adjust performance in an effort to extend battery
>>>> life, but in Linux we have a lot of other knobs:
>>>>
>>>> * the ability to tune EPP (energy_performance_preference)
>>>> * set min and max CPU frequencies (scaling_min_freq, scaling_max_freq)
>>>
>>> We use both of these.
>>>
>>>> * offline cores at will
>>>
>>> if a core is parked and you try to write into its sysfs entrypoints,
>>> we found that this might cause a userspace program to hang
>>> indefinitely. Since a lot of settings are per core that's problematic
>>> and since it does not help much most TDP programs dont offer it
>>> anymore.
>>
>> This sounds like a kernel bug if you're hanging programs when trying to
>> write to sysfs files of offlined cores.  If we can get that fixed having
>> that in your toolbelt is quite useful.  I'm sure there are plenty of
>> games that don't really need all the cores up and you can save some power.
>>
>> Can you get a simple reproducer for me into a bug report to look at next
>> year?
> 
> I will try to. This was relayed to me. 

Thanks! If whoever relayed it to you opens the bug report that's totally 
fine with me too.  Just ping me after the new year if I miss it because 
it will be lost in a giant pile of other stuff.

> Disabling SMT also causes a
> crash on the Ally when going to sleep.

Yes; SMT is require to be enabled for s0i3 to work.

That's why the s2idle debugging script flags it.

https://gitlab.freedesktop.org/drm/amd/-/blob/master/scripts/amd_s2idle.py#L1207
> 
>>>
>>>> * change DPM setting in the GPU driver (power_dpm_force_performance_level)
>>>
>>> I think we played with this mostly to try to get lower than 800mhz.
>>> However, going lower than 800mhz in these APUs causes issues.
>>>
>>>> All the core related knobs can be changed on a per-core basis.  So for
>>>> example even on a non-heterogeneous design you could potentially make it
>>>> perform "like" a hetero design where you set it so that some cores don't
>>>> go above nominal frequency or the EPP value is tuned less aggressively
>>>> on some cores.
>>>
>>>> These knobs can have just as drastic of a result on battery life as
>>>> adjusting the various power limiting knobs.  Most importantly these
>>>> knobs have architectural limits that you won't be able to override so
>>>> you can safely change them to min/max and see what happens.
>>>
>>> I feel like we are discussing different targets here. When it comes to
>>> computing tasks, you have a certain block of work that needs to be
>>> done and after that the CPU is free. In this case, programs like tuned
>>> (allegedly) optimize these settings so that they take the minimum
>>> amount of power to complete that block of work.
>>>
>>> However, games are different. Games have no problem burning power if
>>> you let them and they are also playable at a variety of power levels.
>>> Typically, unless the user caps the framerate and video quality of the
>>> game it will use the full slow temp limit value. Even if they do set
>>> that, the game will typically burn 3-4W more than what is needed
>>> depending on TDP, EPP etc.
>>
>> Part of what I'm wondering is if our 4 levels of EPP values "aren't
>> enough" for optimization on a per game basis.
>>
>> IMO They're incredibly rigid.  I do have a patch that can expose "raw"
>> numbers for amd-pstate like intel-pstate does, but I haven't brought it
>> on the lists yet because I'm still discussing it with others internal to
>> AMD.
>>
>> EPP is really about responsiveness in games.
> 
> EPP performance is so detrimental we hide it. It destroys performance
> by sucking power from the GPU. EPP balance_performance is only useful
> in certain emulators that need a lot of CPU. Only balance_power is
> useful. Then, for TDPs lower than 10, setting EPP to power milks
> another 1-2W

Yes; this is a different conversation once you're talking about the 
power share between CPU cores and GPU.  It's part of why I was talking 
about core parking when you don't need all the cores.
diff mbox series

Patch

diff --git a/Documentation/ABI/testing/sysfs-amd-pmf b/Documentation/ABI/testing/sysfs-amd-pmf
index 7fc0e1c2b76b..6f3d5cbf443f 100644
--- a/Documentation/ABI/testing/sysfs-amd-pmf
+++ b/Documentation/ABI/testing/sysfs-amd-pmf
@@ -11,3 +11,13 @@  Description:	Reading this file tells if the AMD Platform Management(PMF)
 		To turn off CnQF user can write "off" to the sysfs node.
 		Note: Systems that support auto mode will not have this sysfs file
 		available.
+
+What:		/sys/devices/platform/*/{spl, fppt, sppt, sppt_apu_only, stt_min, stt_limit_apu, stt_skip_temp}
+Date:		December 2024
+Contact:	Mario Limonciello <mario.limonciello@amd.com>
+Description:	Manual control of AMD PMF APU coefficients
+		.
+		These files are used to manually control the APU coefficients.
+		In order to write to these files the module most be
+		loaded with manual_control=1 and the user must write "custom"
+		to the ACPI platform profile.
diff --git a/drivers/platform/x86/amd/pmf/Makefile b/drivers/platform/x86/amd/pmf/Makefile
index 7d6079b02589..81444d6f4428 100644
--- a/drivers/platform/x86/amd/pmf/Makefile
+++ b/drivers/platform/x86/amd/pmf/Makefile
@@ -7,4 +7,5 @@ 
 obj-$(CONFIG_AMD_PMF) += amd-pmf.o
 amd-pmf-objs := core.o acpi.o sps.o \
 		auto-mode.o cnqf.o \
+		manual.o \
 		tee-if.o spc.o pmf-quirks.o
diff --git a/drivers/platform/x86/amd/pmf/core.c b/drivers/platform/x86/amd/pmf/core.c
index d6af0ca036f1..52a68ca094be 100644
--- a/drivers/platform/x86/amd/pmf/core.c
+++ b/drivers/platform/x86/amd/pmf/core.c
@@ -53,6 +53,10 @@  static bool force_load;
 module_param(force_load, bool, 0444);
 MODULE_PARM_DESC(force_load, "Force load this driver on supported older platforms (experimental)");
 
+bool pmf_manual_control;
+module_param_named(manual_control, pmf_manual_control, bool, 0444);
+MODULE_PARM_DESC(manual_control, "Expose manual control knobs (experimental)");
+
 static int amd_pmf_pwr_src_notify_call(struct notifier_block *nb, unsigned long event, void *data)
 {
 	struct amd_pmf_dev *pmf = container_of(nb, struct amd_pmf_dev, pwr_src_notifier);
@@ -349,6 +353,10 @@  static void amd_pmf_init_features(struct amd_pmf_dev *dev)
 		dev_dbg(dev->dev, "SPS enabled and Platform Profiles registered\n");
 	}
 
+	if (pmf_manual_control) {
+		amd_pmf_init_manual_control(dev);
+		return;
+	}
 	amd_pmf_init_smart_pc(dev);
 	if (dev->smart_pc_enabled) {
 		dev_dbg(dev->dev, "Smart PC Solution Enabled\n");
@@ -485,6 +493,7 @@  static void amd_pmf_remove(struct platform_device *pdev)
 
 static const struct attribute_group *amd_pmf_driver_groups[] = {
 	&cnqf_feature_attribute_group,
+	&manual_attribute_group,
 	NULL,
 };
 
diff --git a/drivers/platform/x86/amd/pmf/manual.c b/drivers/platform/x86/amd/pmf/manual.c
new file mode 100644
index 000000000000..b33fc3cd8d61
--- /dev/null
+++ b/drivers/platform/x86/amd/pmf/manual.c
@@ -0,0 +1,88 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * AMD Platform Management Framework Driver
+ *
+ * Copyright (c) 2024, Advanced Micro Devices, Inc.
+ * All Rights Reserved.
+ *
+ * Author: Mario Limonciello <mario.limonciello@amd.com>
+ */
+
+#include "pmf.h"
+
+#define pmf_manual_attribute(_name, _set_command, _get_command)		\
+static ssize_t _name##_store(struct device *d,				\
+			     struct device_attribute *attr,		\
+			     const char *buf, size_t count)		\
+{									\
+	struct amd_pmf_dev *dev = dev_get_drvdata(d);			\
+	uint val;							\
+									\
+	if (dev->current_profile != PLATFORM_PROFILE_CUSTOM) {		\
+		dev_warn_once(dev->dev,					\
+			      "Manual control is disabled, please set "	\
+			      "platform profile to custom.\n");		\
+		return -EINVAL;						\
+	}								\
+									\
+	if (kstrtouint(buf, 10, &val) < 0)				\
+		return -EINVAL;						\
+									\
+	amd_pmf_send_cmd(dev, _set_command, false, val, NULL);		\
+									\
+	return count;							\
+}									\
+static ssize_t _name##_show(struct device *d,				\
+			   struct device_attribute *attr,		\
+			   char *buf)					\
+{									\
+	struct amd_pmf_dev *dev = dev_get_drvdata(d);			\
+	uint val;							\
+									\
+	amd_pmf_send_cmd(dev, _get_command, true, ARG_NONE, &val);	\
+									\
+	return sysfs_emit(buf, "%u\n", val);				\
+}
+
+pmf_manual_attribute(spl, SET_SPL, GET_SPL);
+static DEVICE_ATTR_RW(spl);
+pmf_manual_attribute(fppt, SET_FPPT, GET_FPPT);
+static DEVICE_ATTR_RW(fppt);
+pmf_manual_attribute(sppt, SET_SPPT, GET_SPPT);
+static DEVICE_ATTR_RW(sppt);
+pmf_manual_attribute(sppt_apu_only, SET_SPPT_APU_ONLY, GET_SPPT_APU_ONLY);
+static DEVICE_ATTR_RW(sppt_apu_only);
+pmf_manual_attribute(stt_min, SET_STT_MIN_LIMIT, GET_STT_MIN_LIMIT);
+static DEVICE_ATTR_RW(stt_min);
+pmf_manual_attribute(stt_limit_apu, SET_STT_LIMIT_APU, GET_STT_LIMIT_APU);
+static DEVICE_ATTR_RW(stt_limit_apu);
+pmf_manual_attribute(stt_skin_temp, SET_STT_LIMIT_HS2, GET_STT_LIMIT_HS2);
+static DEVICE_ATTR_RW(stt_skin_temp);
+
+static umode_t manual_attr_is_visible(struct kobject *kobj, struct attribute *attr, int idx)
+{
+	return pmf_manual_control ? 0660 : 0;
+}
+
+static struct attribute *manual_attrs[] = {
+	&dev_attr_spl.attr,
+	&dev_attr_fppt.attr,
+	&dev_attr_sppt.attr,
+	&dev_attr_sppt_apu_only.attr,
+	&dev_attr_stt_min.attr,
+	&dev_attr_stt_limit_apu.attr,
+	&dev_attr_stt_skin_temp.attr,
+	NULL,
+};
+
+const struct attribute_group manual_attribute_group = {
+	.attrs = manual_attrs,
+	.is_visible = manual_attr_is_visible,
+};
+
+void amd_pmf_init_manual_control(struct amd_pmf_dev *dev)
+{
+	add_taint(TAINT_CPU_OUT_OF_SPEC, LOCKDEP_STILL_OK);
+	pr_crit("Manual PMF control is enabled, please disable it before "
+		"reporting any bugs unrelated to PMF.\n");
+}
diff --git a/drivers/platform/x86/amd/pmf/pmf.h b/drivers/platform/x86/amd/pmf/pmf.h
index 8ce8816da9c1..ca3df63cf190 100644
--- a/drivers/platform/x86/amd/pmf/pmf.h
+++ b/drivers/platform/x86/amd/pmf/pmf.h
@@ -798,4 +798,9 @@  void amd_pmf_dump_ta_inputs(struct amd_pmf_dev *dev, struct ta_pmf_enact_table *
 /* Quirk infrastructure */
 void amd_pmf_quirks_init(struct amd_pmf_dev *dev);
 
+/* Manual configuration */
+extern bool pmf_manual_control;
+extern const struct attribute_group manual_attribute_group;
+void amd_pmf_init_manual_control(struct amd_pmf_dev *dev);
+
 #endif /* PMF_H */
diff --git a/drivers/platform/x86/amd/pmf/sps.c b/drivers/platform/x86/amd/pmf/sps.c
index 92f7fb22277d..6db88e523a86 100644
--- a/drivers/platform/x86/amd/pmf/sps.c
+++ b/drivers/platform/x86/amd/pmf/sps.c
@@ -305,6 +305,8 @@  int amd_pmf_get_pprof_modes(struct amd_pmf_dev *pmf)
 	case PLATFORM_PROFILE_LOW_POWER:
 		mode = POWER_MODE_POWER_SAVER;
 		break;
+	case PLATFORM_PROFILE_CUSTOM:
+		return 0;
 	default:
 		dev_err(pmf->dev, "Unknown Platform Profile.\n");
 		return -EOPNOTSUPP;
@@ -412,6 +414,8 @@  int amd_pmf_init_sps(struct amd_pmf_dev *dev)
 	set_bit(PLATFORM_PROFILE_LOW_POWER, dev->pprof.choices);
 	set_bit(PLATFORM_PROFILE_BALANCED, dev->pprof.choices);
 	set_bit(PLATFORM_PROFILE_PERFORMANCE, dev->pprof.choices);
+	if (pmf_manual_control)
+		set_bit(PLATFORM_PROFILE_CUSTOM, dev->pprof.choices);
 
 	/* Create platform_profile structure and register */
 	err = platform_profile_register(&dev->pprof);