diff mbox series

[net-next,3/3] net: phy: realtek: add hwmon support for temp sensor on RTL822x

Message ID dbfeb139-808f-4345-afe8-830b7f4da26a@gmail.com (mailing list archive)
State Handled Elsewhere
Headers show
Series None | expand

Commit Message

Heiner Kallweit Jan. 10, 2025, 11:48 a.m. UTC
This adds hwmon support for the temperature sensor on RTL822x. It's
available on the standalone versions of the PHY's, and on the integrated
PHY's in RTL8125B/RTL8125D/RTL8126.

Notes:
- over-temp threshold is set by PHY power-on default or BIOS / boot loader
  and it's read-only
- over-temp alarm remains set, even if temperature drops below threshold

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/Kconfig         |  6 +++
 drivers/net/phy/Makefile        |  1 +
 drivers/net/phy/realtek.h       | 10 ++++
 drivers/net/phy/realtek_hwmon.c | 83 +++++++++++++++++++++++++++++++++
 drivers/net/phy/realtek_main.c  | 12 +++++
 5 files changed, 112 insertions(+)
 create mode 100644 drivers/net/phy/realtek.h
 create mode 100644 drivers/net/phy/realtek_hwmon.c

Comments

Andrew Lunn Jan. 10, 2025, 9:10 p.m. UTC | #1
> - over-temp alarm remains set, even if temperature drops below threshold

> +int rtl822x_hwmon_init(struct phy_device *phydev)
> +{
> +	struct device *hwdev, *dev = &phydev->mdio.dev;
> +	const char *name;
> +
> +	/* Ensure over-temp alarm is reset. */
> +	phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);

So it is possible to clear the alarm.

I know you wanted to experiment with this some more....

If the alarm is still set, does that prevent the PHY renegotiating the
higher link speed? If you clear the alarm, does that allow it to
renegotiate the higher link speed? Or is a down/up still required?
Does an down/up clear the alarm if the temperature is below the
threshold?

Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
are they supported to self clear on read?

I'm wondering if we are heading towards ABI issues? You have defined:

- over-temp alarm remains set, even if temperature drops below threshold

so that kind of eliminates the possibility of implementing self
clearing any time in the future. Explicit clearing via a write is
probably O.K, because the user needs to take an explicit action.  Are
there other ABI issues i have not thought about.

	Andrew
Heiner Kallweit Jan. 10, 2025, 9:41 p.m. UTC | #2
On 10.01.2025 22:10, Andrew Lunn wrote:
>> - over-temp alarm remains set, even if temperature drops below threshold
> 
>> +int rtl822x_hwmon_init(struct phy_device *phydev)
>> +{
>> +	struct device *hwdev, *dev = &phydev->mdio.dev;
>> +	const char *name;
>> +
>> +	/* Ensure over-temp alarm is reset. */
>> +	phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
> 
> So it is possible to clear the alarm.
> 
> I know you wanted to experiment with this some more....
> 
> If the alarm is still set, does that prevent the PHY renegotiating the
> higher link speed? If you clear the alarm, does that allow it to
> renegotiate the higher link speed? Or is a down/up still required?
> Does an down/up clear the alarm if the temperature is below the
> threshold?
> 
I tested wrt one of your previous questions, when exceeding the
temperature threshold the chip actually removes 2.5Gbps from the
advertisement register.

If the alarm is set, the chip won't switch back automatically to
2.5Gbps even if the temperature drops below the alarm threshold.

When clearing the alarm the chip adds 2.5Gbps back to the advertisement
register. Worth to be mentioned:
The temperature is checked only if the link speed is 2.5Gbps.
Therefore the chip thinks it's safe to add back the 2.5Gbps mode
when the alarm is cleared.

What I didn't test is whether it's possible to manually add 2.5Gbps
to the advertisement register whilst the alarm is set.
But I assume that's the case.

> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
> are they supported to self clear on read?
> 
Documentation/hwmon/sysfs-interface.rst states that the alarm
is a read-only attribute:

+-------------------------------+-----------------------+
| **`in[0-*]_alarm`,		| Channel alarm		|
| `curr[1-*]_alarm`,		|			|
| `power[1-*]_alarm`,		|   - 0: no alarm	|
| `fan[1-*]_alarm`,		|   - 1: alarm		|
| `temp[1-*]_alarm`**		|			|
|				|   RO			|
+-------------------------------+-----------------------+

Self-clearing is neither mentioned in the documentation nor
implemented in hwmon core.

@Guenter:
If alarm would just mean "current value > alarm threshold", then we
wouldn't need an extra alarm attribute, as this is something which
can be checked in user space.
Has it ever been considered that a user may have to explicitly ack
an alarm to clear it? Would you consider it an ABI violation if
alarm is configured as R/W for being able to clear the alarm?

> I'm wondering if we are heading towards ABI issues? You have defined:
> 
> - over-temp alarm remains set, even if temperature drops below threshold
> 
> so that kind of eliminates the possibility of implementing self
> clearing any time in the future. Explicit clearing via a write is
> probably O.K, because the user needs to take an explicit action.  Are
> there other ABI issues i have not thought about.
> 
> 	Andrew

Heiner
Guenter Roeck Jan. 11, 2025, 12:08 a.m. UTC | #3
On 1/10/25 13:10, Andrew Lunn wrote:
>> - over-temp alarm remains set, even if temperature drops below threshold
> 
>> +int rtl822x_hwmon_init(struct phy_device *phydev)
>> +{
>> +	struct device *hwdev, *dev = &phydev->mdio.dev;
>> +	const char *name;
>> +
>> +	/* Ensure over-temp alarm is reset. */
>> +	phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
> 
> So it is possible to clear the alarm.
> 
> I know you wanted to experiment with this some more....
> 
> If the alarm is still set, does that prevent the PHY renegotiating the
> higher link speed? If you clear the alarm, does that allow it to
> renegotiate the higher link speed? Or is a down/up still required?
> Does an down/up clear the alarm if the temperature is below the
> threshold?
> 
> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
> are they supported to self clear on read?
> 

Alarm attributes are supposed to self clear on read unless the condition
persists.

> I'm wondering if we are heading towards ABI issues? You have defined:
> 
> - over-temp alarm remains set, even if temperature drops below threshold
> 
> so that kind of eliminates the possibility of implementing self
> clearing any time in the future. Explicit clearing via a write is
> probably O.K, because the user needs to take an explicit action.  Are
> there other ABI issues i have not thought about.
> 

Alarm attributes are supposed to be read-only.

Guenter
Guenter Roeck Jan. 11, 2025, 12:20 a.m. UTC | #4
On 1/10/25 13:41, Heiner Kallweit wrote:
> On 10.01.2025 22:10, Andrew Lunn wrote:
>>> - over-temp alarm remains set, even if temperature drops below threshold
>>
>>> +int rtl822x_hwmon_init(struct phy_device *phydev)
>>> +{
>>> +	struct device *hwdev, *dev = &phydev->mdio.dev;
>>> +	const char *name;
>>> +
>>> +	/* Ensure over-temp alarm is reset. */
>>> +	phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
>>
>> So it is possible to clear the alarm.
>>
>> I know you wanted to experiment with this some more....
>>
>> If the alarm is still set, does that prevent the PHY renegotiating the
>> higher link speed? If you clear the alarm, does that allow it to
>> renegotiate the higher link speed? Or is a down/up still required?
>> Does an down/up clear the alarm if the temperature is below the
>> threshold?
>>
> I tested wrt one of your previous questions, when exceeding the
> temperature threshold the chip actually removes 2.5Gbps from the
> advertisement register.
> 
> If the alarm is set, the chip won't switch back automatically to
> 2.5Gbps even if the temperature drops below the alarm threshold.
> 
> When clearing the alarm the chip adds 2.5Gbps back to the advertisement
> register. Worth to be mentioned:
> The temperature is checked only if the link speed is 2.5Gbps.
> Therefore the chip thinks it's safe to add back the 2.5Gbps mode
> when the alarm is cleared.
> 
> What I didn't test is whether it's possible to manually add 2.5Gbps
> to the advertisement register whilst the alarm is set.
> But I assume that's the case.
> 
>> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
>> are they supported to self clear on read?
>>
> Documentation/hwmon/sysfs-interface.rst states that the alarm
> is a read-only attribute:
> 
> +-------------------------------+-----------------------+
> | **`in[0-*]_alarm`,		| Channel alarm		|
> | `curr[1-*]_alarm`,		|			|
> | `power[1-*]_alarm`,		|   - 0: no alarm	|
> | `fan[1-*]_alarm`,		|   - 1: alarm		|
> | `temp[1-*]_alarm`**		|			|
> |				|   RO			|
> +-------------------------------+-----------------------+
> 
> Self-clearing is neither mentioned in the documentation nor
> implemented in hwmon core.

I would argue that self clearing is implied in "RO". This isn't a hwmon
core problem, it needs to be implemented in drivers. Many chips auto-clear
alarm attributes on read. For those this is automatic. Others need
to explicitly implement clearing alarms.

> 
> @Guenter:
> If alarm would just mean "current value > alarm threshold", then we
> wouldn't need an extra alarm attribute, as this is something which
> can be checked in user space.

Alarm attributes, if implemented properly and if a chip supports interrupts,
should generate sysfs and udev events to inform userspace. An alarm
doesn't just mean "current value > alarm threshold", it can also mean that
the current value was above the threshold at some point since the attribute
was read the last time. For that to work, the attribute must be sticky
until read.

FWIW, I am sure you'll find lots of drivers not implementing this properly,
so there is no need to search for those and use them as precedent.

If you want to support alarm attributes or not is obviously your call,
but they should be self clearing if implemented. I don't want to get complaints
along the line of "the alarm attribute is set but doesn't clear even though
the temperature (or voltage, or whatever) is below the threshold".

> Has it ever been considered that a user may have to explicitly ack
> an alarm to clear it? Would you consider it an ABI violation if
> alarm is configured as R/W for being able to clear the alarm?
> 

Yes.

Guenter

>> I'm wondering if we are heading towards ABI issues? You have defined:
>>
>> - over-temp alarm remains set, even if temperature drops below threshold
>>
>> so that kind of eliminates the possibility of implementing self
>> clearing any time in the future. Explicit clearing via a write is
>> probably O.K, because the user needs to take an explicit action.  Are
>> there other ABI issues i have not thought about.
>>
>> 	Andrew
> 
> Heiner
kernel test robot Jan. 11, 2025, 8:52 a.m. UTC | #5
Hi Heiner,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Heiner-Kallweit/net-phy-realtek-add-support-for-reading-MDIO_MMD_VEND2-regs-on-RTL8125-RTL8126/20250110-195043
base:   net-next/main
patch link:    https://lore.kernel.org/r/dbfeb139-808f-4345-afe8-830b7f4da26a%40gmail.com
patch subject: [PATCH net-next 3/3] net: phy: realtek: add hwmon support for temp sensor on RTL822x
config: parisc-randconfig-r072-20250111 (https://download.01.org/0day-ci/archive/20250111/202501111618.Lro85HiT-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250111/202501111618.Lro85HiT-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501111618.Lro85HiT-lkp@intel.com/

All error/warnings (new ones prefixed by >>):

>> drivers/net/phy/realtek_hwmon.c:3:10: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'support'
       3 |  * HWMON support for Realtek PHY's
         |          ^~~~~~~
>> drivers/net/phy/realtek_hwmon.c:3:33: warning: missing terminating ' character
       3 |  * HWMON support for Realtek PHY's
         |                                 ^
>> drivers/net/phy/realtek_hwmon.c:3:33: error: missing terminating ' character
       3 |  * HWMON support for Realtek PHY's
         |                                 ^~
>> drivers/net/phy/realtek_hwmon.c:5:39: error: stray '@' in program
       5 |  * Author: Heiner Kallweit <hkallweit1@gmail.com>
         |                                       ^
   In file included from include/uapi/asm-generic/types.h:7,
                    from ./arch/parisc/include/generated/uapi/asm/types.h:1,
                    from include/linux/bitops.h:5,
                    from include/linux/hwmon.h:15,
                    from drivers/net/phy/realtek_hwmon.c:8:
>> include/asm-generic/int-ll64.h:16:9: error: unknown type name '__s8'; did you mean '__u8'?
      16 | typedef __s8  s8;
         |         ^~~~
         |         __u8
   In file included from include/linux/kernel.h:27,
                    from arch/parisc/include/asm/bug.h:5,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:13,
                    from include/asm-generic/preempt.h:5,
                    from ./arch/parisc/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:79,
                    from include/linux/spinlock.h:56,
                    from include/linux/phy.h:15,
                    from drivers/net/phy/realtek_hwmon.c:9:
>> include/linux/math.h:112:9: error: unknown type name '__s8'; did you mean '__u8'?
     112 |         __##type numerator;                             \
         |         ^~
   include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT'
     115 | __STRUCT_FRACT(s8)
         | ^~~~~~~~~~~~~~
   include/linux/math.h:113:9: error: unknown type name '__s8'; did you mean '__u8'?
     113 |         __##type denominator;                           \
         |         ^~
   include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT'
     115 | __STRUCT_FRACT(s8)
         | ^~~~~~~~~~~~~~
   In file included from include/linux/quota.h:42,
                    from include/linux/fs.h:271,
                    from include/linux/compat.h:17,
                    from include/linux/ethtool.h:17,
                    from include/linux/phy.h:16:
>> include/uapi/linux/dqblk_xfs.h:54:9: error: unknown type name '__s8'; did you mean '__u8'?
      54 |         __s8            d_version;      /* version of this structure */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:55:9: error: unknown type name '__s8'; did you mean '__u8'?
      55 |         __s8            d_flags;        /* FS_{USER,PROJ,GROUP}_QUOTA */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:70:9: error: unknown type name '__s8'; did you mean '__u8'?
      70 |         __s8            d_itimer_hi;    /* upper 8 bits of timer values */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:71:9: error: unknown type name '__s8'; did you mean '__u8'?
      71 |         __s8            d_btimer_hi;
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:72:9: error: unknown type name '__s8'; did you mean '__u8'?
      72 |         __s8            d_rtbtimer_hi;
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:73:9: error: unknown type name '__s8'; did you mean '__u8'?
      73 |         __s8            d_padding2;     /* padding2 - for future use */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:166:9: error: unknown type name '__s8'; did you mean '__u8'?
     166 |         __s8            qs_version;     /* version number for future changes */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:168:9: error: unknown type name '__s8'; did you mean '__u8'?
     168 |         __s8            qs_pad;         /* unused */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:210:9: error: unknown type name '__s8'; did you mean '__u8'?
     210 |         __s8                    qs_version;     /* version for future changes */
         |         ^~~~
         |         __u8
   In file included from include/linux/ethtool.h:20:
>> include/uapi/linux/ethtool.h:2523:9: error: unknown type name '__s8'; did you mean '__u8'?
    2523 |         __s8    link_mode_masks_nwords;
         |         ^~~~
         |         __u8
--
   realtek_hwmon.c:3:10: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'support'
       3 |  * HWMON support for Realtek PHY's
         |          ^~~~~~~
   realtek_hwmon.c:3:33: warning: missing terminating ' character
       3 |  * HWMON support for Realtek PHY's
         |                                 ^
   realtek_hwmon.c:3:33: error: missing terminating ' character
       3 |  * HWMON support for Realtek PHY's
         |                                 ^~
   realtek_hwmon.c:5:39: error: stray '@' in program
       5 |  * Author: Heiner Kallweit <hkallweit1@gmail.com>
         |                                       ^
   In file included from include/uapi/asm-generic/types.h:7,
                    from arch/parisc/include/generated/uapi/asm/types.h:1,
                    from include/linux/bitops.h:5,
                    from include/linux/hwmon.h:15,
                    from realtek_hwmon.c:8:
>> include/asm-generic/int-ll64.h:16:9: error: unknown type name '__s8'; did you mean '__u8'?
      16 | typedef __s8  s8;
         |         ^~~~
         |         __u8
   In file included from include/linux/kernel.h:27,
                    from arch/parisc/include/asm/bug.h:5,
                    from include/linux/bug.h:5,
                    from include/linux/thread_info.h:13,
                    from include/asm-generic/preempt.h:5,
                    from arch/parisc/include/generated/asm/preempt.h:1,
                    from include/linux/preempt.h:79,
                    from include/linux/spinlock.h:56,
                    from include/linux/phy.h:15,
                    from realtek_hwmon.c:9:
>> include/linux/math.h:112:9: error: unknown type name '__s8'; did you mean '__u8'?
     112 |         __##type numerator;                             \
         |         ^~
   include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT'
     115 | __STRUCT_FRACT(s8)
         | ^~~~~~~~~~~~~~
   include/linux/math.h:113:9: error: unknown type name '__s8'; did you mean '__u8'?
     113 |         __##type denominator;                           \
         |         ^~
   include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT'
     115 | __STRUCT_FRACT(s8)
         | ^~~~~~~~~~~~~~
   In file included from include/linux/quota.h:42,
                    from include/linux/fs.h:271,
                    from include/linux/compat.h:17,
                    from include/linux/ethtool.h:17,
                    from include/linux/phy.h:16:
>> include/uapi/linux/dqblk_xfs.h:54:9: error: unknown type name '__s8'; did you mean '__u8'?
      54 |         __s8            d_version;      /* version of this structure */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:55:9: error: unknown type name '__s8'; did you mean '__u8'?
      55 |         __s8            d_flags;        /* FS_{USER,PROJ,GROUP}_QUOTA */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:70:9: error: unknown type name '__s8'; did you mean '__u8'?
      70 |         __s8            d_itimer_hi;    /* upper 8 bits of timer values */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:71:9: error: unknown type name '__s8'; did you mean '__u8'?
      71 |         __s8            d_btimer_hi;
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:72:9: error: unknown type name '__s8'; did you mean '__u8'?
      72 |         __s8            d_rtbtimer_hi;
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:73:9: error: unknown type name '__s8'; did you mean '__u8'?
      73 |         __s8            d_padding2;     /* padding2 - for future use */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:166:9: error: unknown type name '__s8'; did you mean '__u8'?
     166 |         __s8            qs_version;     /* version number for future changes */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:168:9: error: unknown type name '__s8'; did you mean '__u8'?
     168 |         __s8            qs_pad;         /* unused */
         |         ^~~~
         |         __u8
   include/uapi/linux/dqblk_xfs.h:210:9: error: unknown type name '__s8'; did you mean '__u8'?
     210 |         __s8                    qs_version;     /* version for future changes */
         |         ^~~~
         |         __u8
   In file included from include/linux/ethtool.h:20:
>> include/uapi/linux/ethtool.h:2523:9: error: unknown type name '__s8'; did you mean '__u8'?
    2523 |         __s8    link_mode_masks_nwords;
         |         ^~~~
         |         __u8


vim +3 drivers/net/phy/realtek_hwmon.c

   > 3	 * HWMON support for Realtek PHY's
     4	 *
   > 5	 * Author: Heiner Kallweit <hkallweit1@gmail.com>
     6	 */
     7
Heiner Kallweit Jan. 11, 2025, 8:52 a.m. UTC | #6
On 11.01.2025 01:20, Guenter Roeck wrote:
> On 1/10/25 13:41, Heiner Kallweit wrote:
>> On 10.01.2025 22:10, Andrew Lunn wrote:
>>>> - over-temp alarm remains set, even if temperature drops below threshold
>>>
>>>> +int rtl822x_hwmon_init(struct phy_device *phydev)
>>>> +{
>>>> +    struct device *hwdev, *dev = &phydev->mdio.dev;
>>>> +    const char *name;
>>>> +
>>>> +    /* Ensure over-temp alarm is reset. */
>>>> +    phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
>>>
>>> So it is possible to clear the alarm.
>>>
>>> I know you wanted to experiment with this some more....
>>>
>>> If the alarm is still set, does that prevent the PHY renegotiating the
>>> higher link speed? If you clear the alarm, does that allow it to
>>> renegotiate the higher link speed? Or is a down/up still required?
>>> Does an down/up clear the alarm if the temperature is below the
>>> threshold?
>>>
>> I tested wrt one of your previous questions, when exceeding the
>> temperature threshold the chip actually removes 2.5Gbps from the
>> advertisement register.
>>
>> If the alarm is set, the chip won't switch back automatically to
>> 2.5Gbps even if the temperature drops below the alarm threshold.
>>
>> When clearing the alarm the chip adds 2.5Gbps back to the advertisement
>> register. Worth to be mentioned:
>> The temperature is checked only if the link speed is 2.5Gbps.
>> Therefore the chip thinks it's safe to add back the 2.5Gbps mode
>> when the alarm is cleared.
>>
>> What I didn't test is whether it's possible to manually add 2.5Gbps
>> to the advertisement register whilst the alarm is set.
>> But I assume that's the case.
>>
>>> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
>>> are they supported to self clear on read?
>>>
>> Documentation/hwmon/sysfs-interface.rst states that the alarm
>> is a read-only attribute:
>>
>> +-------------------------------+-----------------------+
>> | **`in[0-*]_alarm`,        | Channel alarm        |
>> | `curr[1-*]_alarm`,        |            |
>> | `power[1-*]_alarm`,        |   - 0: no alarm    |
>> | `fan[1-*]_alarm`,        |   - 1: alarm        |
>> | `temp[1-*]_alarm`**        |            |
>> |                |   RO            |
>> +-------------------------------+-----------------------+
>>
>> Self-clearing is neither mentioned in the documentation nor
>> implemented in hwmon core.
> 
> I would argue that self clearing is implied in "RO". This isn't a hwmon
> core problem, it needs to be implemented in drivers. Many chips auto-clear
> alarm attributes on read. For those this is automatic. Others need
> to explicitly implement clearing alarms.
> 
Thanks a lot for the clarifications. Wrt RO and self-clearing see following
snippet from a PHY datasheet. These namings are quite common IMO. 
I think using RC in the alarm attribute description would be clearer.

Type Description
LH Latch high. If the status is high, this field is set to ‘1’ and remains set.
RC Read-cleared. The register field is cleared after read.
RO Read only.
WO Write only.
RW Read and Write.
SC Self-cleared. Writing a ‘1’ to this register field causes the function to be activated immediately, and then
the field will be automatically cleared to ‘0’.


>>
>> @Guenter:
>> If alarm would just mean "current value > alarm threshold", then we
>> wouldn't need an extra alarm attribute, as this is something which
>> can be checked in user space.
> 
> Alarm attributes, if implemented properly and if a chip supports interrupts,
> should generate sysfs and udev events to inform userspace. An alarm
> doesn't just mean "current value > alarm threshold", it can also mean that
> the current value was above the threshold at some point since the attribute
> was read the last time. For that to work, the attribute must be sticky
> until read.
> 
> FWIW, I am sure you'll find lots of drivers not implementing this properly,
> so there is no need to search for those and use them as precedent.
> 
> If you want to support alarm attributes or not is obviously your call,
> but they should be self clearing if implemented. I don't want to get complaints
> along the line of "the alarm attribute is set but doesn't clear even though
> the temperature (or voltage, or whatever) is below the threshold".
> 
>> Has it ever been considered that a user may have to explicitly ack
>> an alarm to clear it? Would you consider it an ABI violation if
>> alarm is configured as R/W for being able to clear the alarm?
>>
> 
> Yes.
> 
> Guenter
> 
>>> I'm wondering if we are heading towards ABI issues? You have defined:
>>>
>>> - over-temp alarm remains set, even if temperature drops below threshold
>>>
>>> so that kind of eliminates the possibility of implementing self
>>> clearing any time in the future. Explicit clearing via a write is
>>> probably O.K, because the user needs to take an explicit action.  Are
>>> there other ABI issues i have not thought about.
>>>
>>>     Andrew
>>
>> Heiner
>
kernel test robot Jan. 11, 2025, 9:48 a.m. UTC | #7
Hi Heiner,

kernel test robot noticed the following build errors:

[auto build test ERROR on net-next/main]

url:    https://github.com/intel-lab-lkp/linux/commits/Heiner-Kallweit/net-phy-realtek-add-support-for-reading-MDIO_MMD_VEND2-regs-on-RTL8125-RTL8126/20250110-195043
base:   net-next/main
patch link:    https://lore.kernel.org/r/dbfeb139-808f-4345-afe8-830b7f4da26a%40gmail.com
patch subject: [PATCH net-next 3/3] net: phy: realtek: add hwmon support for temp sensor on RTL822x
config: i386-randconfig-013-20250111 (https://download.01.org/0day-ci/archive/20250111/202501111711.i1W0TGwl-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250111/202501111711.i1W0TGwl-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501111711.i1W0TGwl-lkp@intel.com/

All errors (new ones prefixed by >>):

>> drivers/net/phy/realtek_hwmon.c:3:4: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int]
       2 |  *
         |  int
       3 |  * HWMON support for Realtek PHY's
         |    ^
>> drivers/net/phy/realtek_hwmon.c:3:9: error: expected ';' after top level declarator
       3 |  * HWMON support for Realtek PHY's
         |         ^
         |         ;
   drivers/net/phy/realtek_hwmon.c:3:33: warning: missing terminating ' character [-Winvalid-pp-token]
       3 |  * HWMON support for Realtek PHY's
         |                                 ^
   In file included from drivers/net/phy/realtek_hwmon.c:8:
   In file included from include/linux/hwmon.h:15:
   In file included from include/linux/bitops.h:5:
   In file included from ./arch/x86/include/generated/uapi/asm/types.h:1:
   In file included from include/uapi/asm-generic/types.h:7:
   include/asm-generic/int-ll64.h:16:9: error: unknown type name '__s8'; did you mean '__u8'?
      16 | typedef __s8  s8;
         |         ^~~~
         |         __u8
   include/uapi/asm-generic/int-ll64.h:21:23: note: '__u8' declared here
      21 | typedef unsigned char __u8;
         |                       ^
   In file included from drivers/net/phy/realtek_hwmon.c:9:
   In file included from include/linux/phy.h:15:
   In file included from include/linux/spinlock.h:60:
   In file included from include/linux/thread_info.h:60:
   In file included from arch/x86/include/asm/thread_info.h:59:
   In file included from arch/x86/include/asm/cpufeature.h:5:
   In file included from arch/x86/include/asm/processor.h:35:
   In file included from include/linux/math64.h:6:
   include/linux/math.h:115:1: error: unknown type name '__s8'; did you mean '__u8'?
     115 | __STRUCT_FRACT(s8)
         | ^
   include/linux/math.h:112:2: note: expanded from macro '__STRUCT_FRACT'
     112 |         __##type numerator;                             \
         |         ^
   <scratch space>:221:1: note: expanded from here
     221 | __s8
         | ^
   include/uapi/asm-generic/int-ll64.h:21:23: note: '__u8' declared here
      21 | typedef unsigned char __u8;
         |                       ^
   In file included from drivers/net/phy/realtek_hwmon.c:9:
   In file included from include/linux/phy.h:15:
   In file included from include/linux/spinlock.h:60:
   In file included from include/linux/thread_info.h:60:
   In file included from arch/x86/include/asm/thread_info.h:59:
   In file included from arch/x86/include/asm/cpufeature.h:5:
   In file included from arch/x86/include/asm/processor.h:35:
   In file included from include/linux/math64.h:6:
   include/linux/math.h:115:1: error: unknown type name '__s8'; did you mean '__u8'?
     115 | __STRUCT_FRACT(s8)
         | ^
   include/linux/math.h:113:2: note: expanded from macro '__STRUCT_FRACT'
     113 |         __##type denominator;                           \
         |         ^
   <scratch space>:222:1: note: expanded from here
     222 | __s8
         | ^
   include/uapi/asm-generic/int-ll64.h:21:23: note: '__u8' declared here
      21 | typedef unsigned char __u8;
         |                       ^
   In file included from drivers/net/phy/realtek_hwmon.c:9:
   In file included from include/linux/phy.h:16:
   In file included from include/linux/ethtool.h:17:
   In file included from include/linux/compat.h:17:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                         ^        ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/net/phy/realtek_hwmon.c:9:
   In file included from include/linux/phy.h:16:
   In file included from include/linux/ethtool.h:17:
   In file included from include/linux/compat.h:17:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds]
      98 |                 return (set->sig[3] | set->sig[2] |
         |                                       ^        ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^
   In file included from drivers/net/phy/realtek_hwmon.c:9:
   In file included from include/linux/phy.h:16:
   In file included from include/linux/ethtool.h:17:
   In file included from include/linux/compat.h:17:
   In file included from include/linux/fs.h:33:
   In file included from include/linux/percpu-rwsem.h:7:
   In file included from include/linux/rcuwait.h:6:
   In file included from include/linux/sched/signal.h:6:
   include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds]
     114 |                 return  (set1->sig[3] == set2->sig[3]) &&
         |                          ^         ~
   arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here
      24 |         unsigned long sig[_NSIG_WORDS];
         |         ^


vim +/int +3 drivers/net/phy/realtek_hwmon.c

   > 3	 * HWMON support for Realtek PHY's
     4	 *
     5	 * Author: Heiner Kallweit <hkallweit1@gmail.com>
     6	 */
     7
Heiner Kallweit Jan. 11, 2025, 10:16 a.m. UTC | #8
On 10.01.2025 22:10, Andrew Lunn wrote:
>> - over-temp alarm remains set, even if temperature drops below threshold
> 
>> +int rtl822x_hwmon_init(struct phy_device *phydev)
>> +{
>> +	struct device *hwdev, *dev = &phydev->mdio.dev;
>> +	const char *name;
>> +
>> +	/* Ensure over-temp alarm is reset. */
>> +	phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
> 
> So it is possible to clear the alarm.
> 
> I know you wanted to experiment with this some more....
> 
> If the alarm is still set, does that prevent the PHY renegotiating the
> higher link speed? If you clear the alarm, does that allow it to
> renegotiate the higher link speed? Or is a down/up still required?
> Does an down/up clear the alarm if the temperature is below the
> threshold?
> 
> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or
> are they supported to self clear on read?
> 
> I'm wondering if we are heading towards ABI issues? You have defined:
> 
> - over-temp alarm remains set, even if temperature drops below threshold
> 
> so that kind of eliminates the possibility of implementing self
> clearing any time in the future. Explicit clearing via a write is
> probably O.K, because the user needs to take an explicit action.  Are
> there other ABI issues i have not thought about.
> 

According to Guenters feedback the alarm attribute must not be written
and is expected to be self-clearing on read.
If we would clear the alarm in the chip on alarm attribute read, then
we can have the following ugly scenario:

1. Temperature threshold is exceeded and chip reduces speed to 1Gbps
2. Temperature is falling below alarm threshold
3. User uses "sensors" to check the current temperature
4. The implicit alarm attribute read causes the chip to clear the
   alarm and re-enable 2.5Gbps speed, resulting in the temperature
   alarm threshold being exceeded very soon again.

What isn't nice here is that it's not transparent to the user that
a read-only command from his perspective causes the protective measure
of the chip to be cancelled.

There's no existing hwmon attribute meant to be used by the user
to clear a hw alarm once he took measures to protect the chip
from overheating.

> 	Andrew

Heiner
Andrew Lunn Jan. 11, 2025, 5 p.m. UTC | #9
> According to Guenters feedback the alarm attribute must not be written
> and is expected to be self-clearing on read.
> If we would clear the alarm in the chip on alarm attribute read, then
> we can have the following ugly scenario:
> 
> 1. Temperature threshold is exceeded and chip reduces speed to 1Gbps
> 2. Temperature is falling below alarm threshold
> 3. User uses "sensors" to check the current temperature
> 4. The implicit alarm attribute read causes the chip to clear the
>    alarm and re-enable 2.5Gbps speed, resulting in the temperature
>    alarm threshold being exceeded very soon again.
> 
> What isn't nice here is that it's not transparent to the user that
> a read-only command from his perspective causes the protective measure
> of the chip to be cancelled.
> 
> There's no existing hwmon attribute meant to be used by the user
> to clear a hw alarm once he took measures to protect the chip
> from overheating.

It is generally not the kernels job to implement policy. User space
should be doing that.

I see two different possible policies, and there are maybe others:

1) The user is happy with one second outages every so often as the
chip cycles between too hot and down shifting, and cool enough to
upshift back to the higher speeds.

2) The user prefers to have reliable, slower connectivity and needs to
explicitly do something like down/up the interface to get it back to
the higher speed.

I personally would say, from a user support view, 2) is better. A one
time 1 second break in connectivity and a kernel message is going to
cause less issues.

Maybe the solution is that the hwmon alarm attribute is not directly
the hardware bit, but a software interpretation of the system state.
When the alarm fires, copy it into a software alarm state, but leave
the hardware alarm alone. A hwmon read clears the software state, but
leaves the hardware alone. A down/up of the interface will then clear
both the software and hardware alarm state.

Anybody wanting policy 1) would then need a daemon polling the state
and taking action. 2) would be the default.

How easy is it for you to get into the alarm state? Did you need an
environment chamber/oven, or is it happening for you with just lots of
continuous traffic at typical room temperature? Are we talking about
cheap USB dangles in a sealed plastic case with poor thermal design
are going to be doing this all the time?

	Andrew
Heiner Kallweit Jan. 11, 2025, 5:32 p.m. UTC | #10
On 11.01.2025 18:00, Andrew Lunn wrote:
>> According to Guenters feedback the alarm attribute must not be written
>> and is expected to be self-clearing on read.
>> If we would clear the alarm in the chip on alarm attribute read, then
>> we can have the following ugly scenario:
>>
>> 1. Temperature threshold is exceeded and chip reduces speed to 1Gbps
>> 2. Temperature is falling below alarm threshold
>> 3. User uses "sensors" to check the current temperature
>> 4. The implicit alarm attribute read causes the chip to clear the
>>    alarm and re-enable 2.5Gbps speed, resulting in the temperature
>>    alarm threshold being exceeded very soon again.
>>
>> What isn't nice here is that it's not transparent to the user that
>> a read-only command from his perspective causes the protective measure
>> of the chip to be cancelled.
>>
>> There's no existing hwmon attribute meant to be used by the user
>> to clear a hw alarm once he took measures to protect the chip
>> from overheating.
> 
> It is generally not the kernels job to implement policy. User space
> should be doing that.
> 
> I see two different possible policies, and there are maybe others:
> 
> 1) The user is happy with one second outages every so often as the
> chip cycles between too hot and down shifting, and cool enough to
> upshift back to the higher speeds.
> 
> 2) The user prefers to have reliable, slower connectivity and needs to
> explicitly do something like down/up the interface to get it back to
> the higher speed.
> 
This seems to be exactly how I do it currently.

> I personally would say, from a user support view, 2) is better. A one
> time 1 second break in connectivity and a kernel message is going to
> cause less issues.
> 
> Maybe the solution is that the hwmon alarm attribute is not directly
> the hardware bit, but a software interpretation of the system state.
> When the alarm fires, copy it into a software alarm state, but leave
> the hardware alarm alone. A hwmon read clears the software state, but
> leaves the hardware alone. A down/up of the interface will then clear
> both the software and hardware alarm state.
> 
Not clearing the alarm on read is better from a user perspective IMO
(at least for this specific PHY).
As long as the alarm is active, the chip forces a downshift. 

> Anybody wanting policy 1) would then need a daemon polling the state
> and taking action. 2) would be the default.
> 
> How easy is it for you to get into the alarm state? Did you need an
> environment chamber/oven, or is it happening for you with just lots of
> continuous traffic at typical room temperature? Are we talking about
> cheap USB dangles in a sealed plastic case with poor thermal design
> are going to be doing this all the time?
> 
I have a M.2 card with RTL8126 (w/o heat sink) and an external RJ45 port.
This card sits in a slot underneath the mainboard of a mini PC. At 2.5Gbps
it makes a big difference whether EEE is active. With EEE it reaches 54°C,
w/o EEE temperature quickly goes over 70°C. For tests I add a PHY write
to the code which sets the over-temp threshold to 60°C. Then I can
easily trigger overheating by disabling EEE.

On my system the over-temp threshold set by the BIOS (?) is 120°C.
Even w/o heat sink I can hardly imagine that this threshold is ever
reached.

> 	Andrew

Heiner
Andrew Lunn Jan. 11, 2025, 5:44 p.m. UTC | #11
> On my system the over-temp threshold set by the BIOS (?) is 120°C.
> Even w/o heat sink I can hardly imagine that this threshold is ever
> reached.

So this is to some extent a theoretical problem, in your setup. So i
would not spend too much time on it.

	Andrew
Heiner Kallweit Jan. 11, 2025, 6:06 p.m. UTC | #12
On 11.01.2025 18:44, Andrew Lunn wrote:
>> On my system the over-temp threshold set by the BIOS (?) is 120°C.
>> Even w/o heat sink I can hardly imagine that this threshold is ever
>> reached.
> 
> So this is to some extent a theoretical problem, in your setup. So i
> would not spend too much time on it.
> 
Yes, I think I'll omit the alarm feature and just expose current
temperature and over-temp threshold.

> 	Andrew

Heiner
diff mbox series

Patch

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index dc625f2b3..eae53ef88 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -355,6 +355,12 @@  config REALTEK_PHY
 	help
 	  Supports the Realtek 821x PHY.
 
+config REALTEK_PHY_HWMON
+	def_bool REALTEK_PHY && HWMON
+	depends on !(REALTEK_PHY=y && HWMON=m)
+	help
+	  Optional hwmon support for the temperature sensor
+
 config RENESAS_PHY
 	tristate "Renesas PHYs"
 	help
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index ec480e733..ca369a5c9 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -96,6 +96,7 @@  obj-$(CONFIG_NXP_TJA11XX_PHY)	+= nxp-tja11xx.o
 obj-y				+= qcom/
 obj-$(CONFIG_QSEMI_PHY)		+= qsemi.o
 realtek-y += realtek_main.o
+realtek-$(CONFIG_REALTEK_PHY_HWMON) += realtek_hwmon.o
 obj-$(CONFIG_REALTEK_PHY)	+= realtek.o
 obj-$(CONFIG_RENESAS_PHY)	+= uPD60620.o
 obj-$(CONFIG_ROCKCHIP_PHY)	+= rockchip.o
diff --git a/drivers/net/phy/realtek.h b/drivers/net/phy/realtek.h
new file mode 100644
index 000000000..a39b44fa1
--- /dev/null
+++ b/drivers/net/phy/realtek.h
@@ -0,0 +1,10 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef REALTEK_H
+#define REALTEK_H
+
+#include <linux/phy.h>
+
+int rtl822x_hwmon_init(struct phy_device *phydev);
+
+#endif /* REALTEK_H */
diff --git a/drivers/net/phy/realtek_hwmon.c b/drivers/net/phy/realtek_hwmon.c
new file mode 100644
index 000000000..d3284c83f
--- /dev/null
+++ b/drivers/net/phy/realtek_hwmon.c
@@ -0,0 +1,83 @@ 
+// SPDX-License-Identifier: GPL-2.0+
+ *
+ * HWMON support for Realtek PHY's
+ *
+ * Author: Heiner Kallweit <hkallweit1@gmail.com>
+ */
+
+#include <linux/hwmon.h>
+#include <linux/phy.h>
+
+#include "realtek.h"
+
+#define RTL822X_VND2_TSALRM				0xa662
+#define RTL822X_VND2_TSRR				0xbd84
+#define RTL822X_VND2_TSSR				0xb54c
+
+static int rtl822x_hwmon_get_temp(int raw)
+{
+	if (raw >= 512)
+		raw -= 1024;
+
+	return 1000 * raw / 2;
+}
+
+static int rtl822x_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+			      u32 attr, int channel, long *val)
+{
+	struct phy_device *phydev = dev_get_drvdata(dev);
+	int raw;
+
+	switch (attr) {
+	case hwmon_temp_input:
+		raw = phy_read_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSRR) & 0x3ff;
+		*val = rtl822x_hwmon_get_temp(raw);
+		break;
+	case hwmon_temp_max:
+		/* Chip reduces speed to 1G if threshold is exceeded */
+		raw = phy_read_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSSR) >> 6;
+		*val = rtl822x_hwmon_get_temp(raw);
+		break;
+	case hwmon_temp_alarm:
+		raw = phy_read_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM);
+		*val = !!(raw & 3);
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static const struct hwmon_ops rtl822x_hwmon_ops = {
+	.visible = 0444,
+	.read = rtl822x_hwmon_read,
+};
+
+static const struct hwmon_channel_info * const rtl822x_hwmon_info[] = {
+	HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_ALARM),
+	NULL
+};
+
+static const struct hwmon_chip_info rtl822x_hwmon_chip_info = {
+	.ops = &rtl822x_hwmon_ops,
+	.info = rtl822x_hwmon_info,
+};
+
+int rtl822x_hwmon_init(struct phy_device *phydev)
+{
+	struct device *hwdev, *dev = &phydev->mdio.dev;
+	const char *name;
+
+	/* Ensure over-temp alarm is reset. */
+	phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3);
+
+	name = devm_hwmon_sanitize_name(dev, dev_name(dev));
+	if (IS_ERR(name))
+		return PTR_ERR(name);
+
+	hwdev = devm_hwmon_device_register_with_info(dev, name, phydev,
+						     &rtl822x_hwmon_chip_info,
+						     NULL);
+	return PTR_ERR_OR_ZERO(hwdev);
+}
diff --git a/drivers/net/phy/realtek_main.c b/drivers/net/phy/realtek_main.c
index af9874143..38149958d 100644
--- a/drivers/net/phy/realtek_main.c
+++ b/drivers/net/phy/realtek_main.c
@@ -14,6 +14,8 @@ 
 #include <linux/delay.h>
 #include <linux/clk.h>
 
+#include "realtek.h"
+
 #define RTL821x_PHYSR				0x11
 #define RTL821x_PHYSR_DUPLEX			BIT(13)
 #define RTL821x_PHYSR_SPEED			GENMASK(15, 14)
@@ -820,6 +822,15 @@  static int rtl822x_write_mmd(struct phy_device *phydev, int devnum, u16 regnum,
 	return ret;
 }
 
+static int rtl822x_probe(struct phy_device *phydev)
+{
+	if (IS_ENABLED(CONFIG_REALTEK_PHY_HWMON) &&
+	    phydev->phy_id != RTL_GENERIC_PHYID)
+		return rtl822x_hwmon_init(phydev);
+
+	return 0;
+}
+
 static int rtl822xb_config_init(struct phy_device *phydev)
 {
 	bool has_2500, has_sgmii;
@@ -1519,6 +1530,7 @@  static struct phy_driver realtek_drvs[] = {
 		.match_phy_device = rtl_internal_nbaset_match_phy_device,
 		.name           = "Realtek Internal NBASE-T PHY",
 		.flags		= PHY_IS_INTERNAL,
+		.probe		= rtl822x_probe,
 		.get_features   = rtl822x_get_features,
 		.config_aneg    = rtl822x_config_aneg,
 		.read_status    = rtl822x_read_status,