mbox series

[RFC,0/1] platform: x86: tuxi: Implement TUXEDO TUXI ACPI TFAN as thermal subsystem

Message ID 20241127112141.42920-1-wse@tuxedocomputers.com (mailing list archive)
Headers show
Series platform: x86: tuxi: Implement TUXEDO TUXI ACPI TFAN as thermal subsystem | expand

Message

Werner Sembach Nov. 27, 2024, 11:21 a.m. UTC
Hi,

Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@tuxedocomputers.com/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@roeck-us.net/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall:

As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor.

So my ideas/questions:
- Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor
- Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied
- How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor.
- Can I somehow restrict userspace from using certain govenors?
- I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then?

Best regards,
Werner

Werner Sembach (1):
  platform: x86: tuxi: Implement TUXEDO TUXI ACPI TFAN as thermal
    subsystem

 drivers/platform/x86/Makefile                 |   3 +
 drivers/platform/x86/tuxedo/Kbuild            |   6 +
 drivers/platform/x86/tuxedo/nbxx/Kbuild       |   8 +
 drivers/platform/x86/tuxedo/nbxx/Kconfig      |   9 +
 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.c  |  96 +++++++
 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.h  |  84 ++++++
 .../x86/tuxedo/nbxx/acpi_tuxi_thermal.c       | 241 ++++++++++++++++++
 .../x86/tuxedo/nbxx/acpi_tuxi_thermal.h       |  14 +
 8 files changed, 461 insertions(+)
 create mode 100644 drivers/platform/x86/tuxedo/Kbuild
 create mode 100644 drivers/platform/x86/tuxedo/nbxx/Kbuild
 create mode 100644 drivers/platform/x86/tuxedo/nbxx/Kconfig
 create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.c
 create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.h
 create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi_thermal.c
 create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi_thermal.h

Comments

Armin Wolf Dec. 1, 2024, 5:58 p.m. UTC | #1
Am 27.11.24 um 12:21 schrieb Werner Sembach:

> Hi,
>
> Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@tuxedocomputers.com/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@roeck-us.net/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall:
>
> As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor.
>
> So my ideas/questions:
> - Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor
> - Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied
> - How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor.
> - Can I somehow restrict userspace from using certain govenors?
> - I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then?

Hi,

i am having little experience with the thermal subsystem, but i suggest that policy decisions like "min_fanspeed_percent" should either:

- come from the hardware/firmware itself
- come from userspace

Effectively this driver tries to enforce a Tuxedo-specific policy that is not directly based on hardware limits. The book "Linux Device Drivers"
says:

	"the role of a device driver is providing///mechanism/, not/policy/."

Furthermore:

	"When/writing/ drivers, a programmer should pay particular attention to this fundamental concept: write kernel code to access the hardware,
	 but don't force particular policies on the user, since different users have different needs. The driver should deal with making the hardware
	 available, leaving all the issues about/how/ to use the hardware to the applications. A driver, then, is flexible if it offers access to the
	 hardware capabilities without adding constraints."

The issue is that the Tuxedo-specific policy is not directly connected with the hardware. Some hardware might need a bigger minimum fan speed than
other hardware for example.

The hardware (or rather firmware in this case) should communicate those constraints to the linux driver so that we do not need to rely on random
temperatures for hardware protection.

This ACPI interface however basically provides us with a hwmon interface and Tuxedo now wants the kernel to enforce their policy on it. I suspect that
happens for warranty reasons, right?

Maybe there is a way to enforce this policy through userspace applications?

Thanks,
Armin Wolf

> Best regards,
> Werner
>
> Werner Sembach (1):
>    platform: x86: tuxi: Implement TUXEDO TUXI ACPI TFAN as thermal
>      subsystem
>
>   drivers/platform/x86/Makefile                 |   3 +
>   drivers/platform/x86/tuxedo/Kbuild            |   6 +
>   drivers/platform/x86/tuxedo/nbxx/Kbuild       |   8 +
>   drivers/platform/x86/tuxedo/nbxx/Kconfig      |   9 +
>   drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.c  |  96 +++++++
>   drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.h  |  84 ++++++
>   .../x86/tuxedo/nbxx/acpi_tuxi_thermal.c       | 241 ++++++++++++++++++
>   .../x86/tuxedo/nbxx/acpi_tuxi_thermal.h       |  14 +
>   8 files changed, 461 insertions(+)
>   create mode 100644 drivers/platform/x86/tuxedo/Kbuild
>   create mode 100644 drivers/platform/x86/tuxedo/nbxx/Kbuild
>   create mode 100644 drivers/platform/x86/tuxedo/nbxx/Kconfig
>   create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.c
>   create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi.h
>   create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi_thermal.c
>   create mode 100644 drivers/platform/x86/tuxedo/nbxx/acpi_tuxi_thermal.h
>
Hans de Goede Dec. 2, 2024, 9:40 a.m. UTC | #2
Hi Armin, Werner,

On 1-Dec-24 6:58 PM, Armin Wolf wrote:
> Am 27.11.24 um 12:21 schrieb Werner Sembach:
> 
>> Hi,
>>
>> Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@tuxedocomputers.com/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@roeck-us.net/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall:
>>
>> As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor.
>>
>> So my ideas/questions:
>> - Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor
>> - Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied
>> - How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor.
>> - Can I somehow restrict userspace from using certain govenors?
>> - I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then?
> 
> Hi,
> 
> i am having little experience with the thermal subsystem, but i suggest that policy decisions like "min_fanspeed_percent" should either:
> 
> - come from the hardware/firmware itself
> - come from userspace
> 
> Effectively this driver tries to enforce a Tuxedo-specific policy that is not directly based on hardware limits. The book "Linux Device Drivers"
> says:
> 
>     "the role of a device driver is providing///mechanism/, not/policy/."
> 
> Furthermore:
> 
>     "When/writing/ drivers, a programmer should pay particular attention to this fundamental concept: write kernel code to access the hardware,
>      but don't force particular policies on the user, since different users have different needs. The driver should deal with making the hardware
>      available, leaving all the issues about/how/ to use the hardware to the applications. A driver, then, is flexible if it offers access to the
>      hardware capabilities without adding constraints."
> 
> The issue is that the Tuxedo-specific policy is not directly connected with the hardware. Some hardware might need a bigger minimum fan speed than
> other hardware for example.
> 
> The hardware (or rather firmware in this case) should communicate those constraints to the linux driver so that we do not need to rely on random
> temperatures for hardware protection.
> 
> This ACPI interface however basically provides us with a hwmon interface and Tuxedo now wants the kernel to enforce their policy on it. I suspect that
> happens for warranty reasons, right?
> 
> Maybe there is a way to enforce this policy through userspace applications?

An important role of device-drivers is also to avoid driving hardware outside
its valid specifications. E.g. charger drivers in the power_supply subsystem
have constant_charge_current and constant_charge_voltage settings which
give a max charging speed in Amperes and a max charging Voltage (typically
4.2V for single Li-xxx cell batteries).

Setting these too high can be very dangerous leading to batteries catching
fire / exploding. so naturally the drivers also have and enforce
constant_charge_current_max and constant_charge_voltage_max sysfs attributes
and the values there do not reflect the maximum what the charger-chip can
handle, but the maximum which the manufacturer has specified for the battery.

This info can e.g. come from devicetree but if not present then the values
applied by the firmware at boot should be read back and used not only
as active, but also as max values.

I believe what Werner wants is pretty much the same thing, allow userspace
to provide its own fan management, but have the kernel monitor the temperature
and if the temperature goes say above 80° Celsius, enforce a min fan-speed of
30% and at 90° enforce a fan-speed of say 70%.

Yes the exact values are a for of policy, but this is a policy to protect
hw from getting damaged and as such is fine to have in the kernel.

As for how to implement this with the thermal subsystem (which indeed looks
like it might be the way to go) I'm not familiar enough with the thermal
subsystem either.

Werner, I suggest that you send an email to the thermal subsystem maintainers
and ask them about this. It would be good to start with explaining your
problem before asking the questions. I would leave out details about the ACPI
interface. Just say you have a temp-sensor + pwm to control fan speed combination
where you want to allow userspace control while having the kernel monitor
things and have the kernel overriding userspace's fan speed setting when
things get too hot, to deal with e.g. the userspace process crashing at
a low fan speed setting.

Regards,

Hans
Werner Sembach Dec. 2, 2024, 2:37 p.m. UTC | #3
Hi Hans, hi Armin,

Am 02.12.24 um 10:40 schrieb Hans de Goede:
> Hi Armin, Werner,
>
> On 1-Dec-24 6:58 PM, Armin Wolf wrote:
>> Am 27.11.24 um 12:21 schrieb Werner Sembach:
>>
>>> Hi,
>>>
>>> Following up to https://lore.kernel.org/all/172b7acd-4313-4924-bcbc-41b73b39ada0@tuxedocomputers.com/ and https://lore.kernel.org/all/f26d867e-f247-43bb-a78b-be0bce35c973@roeck-us.net/ I experimented with the thermal subsystem and these are my results so far, but I'm hitting a bit of a wall:
>>>
>>> As far as I can tell to implement "2. As long as GTMP is > 80°C fan speed must be at least 30%." I would need to add a new gevenor, lets call it "user_space_with_safeguards". I would be nice when the temp <-> fanspeed relation could be passed via the thermal_trip structure. And safeguarding the hardware from userspace only works when I can restrict userspace from just selecting the preexisting "user_space" govenor.
>>>
>>> So my ideas/questions:
>>> - Add a field "min_fanspeed_percent" to the thermal_trip struct that will only be used by the "user_space_with_safeguards" govenor
>>> - Add a "user_space_with_safeguards" govenor that is the same as the "user_space" govenor, but on trip, a minimum speed is applied
>>> - How can i ensure that on further speed updates the min speed is applied to? I could just implement it directly in the cdev, but that would be spagetti coding around the govenor.
>>> - Can I somehow restrict userspace from using certain govenors?
>>> - I'm a litte bit confused about the thermal zone "mode" sysfs switch, here it says deactivate for userspace control: https://elixir.bootlin.com/linux/v6.12/source/Documentation/ABI/testing/sysfs-class-thermal#L20, but what about the user_space govenor then?
>> Hi,
>>
>> i am having little experience with the thermal subsystem, but i suggest that policy decisions like "min_fanspeed_percent" should either:
>>
>> - come from the hardware/firmware itself
>> - come from userspace
>>
>> Effectively this driver tries to enforce a Tuxedo-specific policy that is not directly based on hardware limits. The book "Linux Device Drivers"
>> says:
>>
>>      "the role of a device driver is providing///mechanism/, not/policy/."

But it is a hardware limit, as the hardware will wear and might break if not run 
within its limits.

As this driver is for already released hardware, a firmware fix is not possible.

>>
>> Furthermore:
>>
>>      "When/writing/ drivers, a programmer should pay particular attention to this fundamental concept: write kernel code to access the hardware,
>>       but don't force particular policies on the user, since different users have different needs. The driver should deal with making the hardware
>>       available, leaving all the issues about/how/ to use the hardware to the applications. A driver, then, is flexible if it offers access to the
>>       hardware capabilities without adding constraints."
>>
>> The issue is that the Tuxedo-specific policy is not directly connected with the hardware. Some hardware might need a bigger minimum fan speed than
>> other hardware for example.

Currently this driver is for exactly 2 TUXEDO devices (Sirius 16 gen 1 & 2).

Future devices can have different limits provided by the ACPI or a list in the 
driver.

>>
>> The hardware (or rather firmware in this case) should communicate those constraints to the linux driver so that we do not need to rely on random
>> temperatures for hardware protection.
But then the driver still needs to enforce them.
>>
>> This ACPI interface however basically provides us with a hwmon interface and Tuxedo now wants the kernel to enforce their policy on it. I suspect that
>> happens for warranty reasons, right?

The ACPI interface is only present on TUXEDO devices (TUXI stands for TUXEDO 
Interface and TFAN for TUXEDO FAN). And we want the policy for the devices to 
live longer, not only for warranty.

TUXI meant as a simple getter and setter interface for the EC. We don't have 
access to the EC firmware source so more complex stuff needs to be done 
somewhere else, i.e. the driver.

>>
>> Maybe there is a way to enforce this policy through userspace applications?
That's no enforcement as userspace can just write the sysfs directly.
> An important role of device-drivers is also to avoid driving hardware outside
> its valid specifications. E.g. charger drivers in the power_supply subsystem
> have constant_charge_current and constant_charge_voltage settings which
> give a max charging speed in Amperes and a max charging Voltage (typically
> 4.2V for single Li-xxx cell batteries).
>
> Setting these too high can be very dangerous leading to batteries catching
> fire / exploding. so naturally the drivers also have and enforce
> constant_charge_current_max and constant_charge_voltage_max sysfs attributes
> and the values there do not reflect the maximum what the charger-chip can
> handle, but the maximum which the manufacturer has specified for the battery.
>
> This info can e.g. come from devicetree but if not present then the values
> applied by the firmware at boot should be read back and used not only
> as active, but also as max values.
>
> I believe what Werner wants is pretty much the same thing, allow userspace
> to provide its own fan management, but have the kernel monitor the temperature
> and if the temperature goes say above 80° Celsius, enforce a min fan-speed of
> 30% and at 90° enforce a fan-speed of say 70%.
>
> Yes the exact values are a for of policy, but this is a policy to protect
> hw from getting damaged and as such is fine to have in the kernel.
>
> As for how to implement this with the thermal subsystem (which indeed looks
> like it might be the way to go) I'm not familiar enough with the thermal
> subsystem either.
>
> Werner, I suggest that you send an email to the thermal subsystem maintainers
> and ask them about this. It would be good to start with explaining your
> problem before asking the questions. I would leave out details about the ACPI
> interface. Just say you have a temp-sensor + pwm to control fan speed combination
> where you want to allow userspace control while having the kernel monitor
> things and have the kernel overriding userspace's fan speed setting when
> things get too hot, to deal with e.g. the userspace process crashing at
> a low fan speed setting.
Ack
>
> Regards,
>
> Hans
>
Best regards,

Werner