Message ID | dbfeb139-808f-4345-afe8-830b7f4da26a@gmail.com (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | None | expand |
> - over-temp alarm remains set, even if temperature drops below threshold > +int rtl822x_hwmon_init(struct phy_device *phydev) > +{ > + struct device *hwdev, *dev = &phydev->mdio.dev; > + const char *name; > + > + /* Ensure over-temp alarm is reset. */ > + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); So it is possible to clear the alarm. I know you wanted to experiment with this some more.... If the alarm is still set, does that prevent the PHY renegotiating the higher link speed? If you clear the alarm, does that allow it to renegotiate the higher link speed? Or is a down/up still required? Does an down/up clear the alarm if the temperature is below the threshold? Also, does HWMON support clearing alarms? Writing a 0 to the file? Or are they supported to self clear on read? I'm wondering if we are heading towards ABI issues? You have defined: - over-temp alarm remains set, even if temperature drops below threshold so that kind of eliminates the possibility of implementing self clearing any time in the future. Explicit clearing via a write is probably O.K, because the user needs to take an explicit action. Are there other ABI issues i have not thought about. Andrew
On 10.01.2025 22:10, Andrew Lunn wrote: >> - over-temp alarm remains set, even if temperature drops below threshold > >> +int rtl822x_hwmon_init(struct phy_device *phydev) >> +{ >> + struct device *hwdev, *dev = &phydev->mdio.dev; >> + const char *name; >> + >> + /* Ensure over-temp alarm is reset. */ >> + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); > > So it is possible to clear the alarm. > > I know you wanted to experiment with this some more.... > > If the alarm is still set, does that prevent the PHY renegotiating the > higher link speed? If you clear the alarm, does that allow it to > renegotiate the higher link speed? Or is a down/up still required? > Does an down/up clear the alarm if the temperature is below the > threshold? > I tested wrt one of your previous questions, when exceeding the temperature threshold the chip actually removes 2.5Gbps from the advertisement register. If the alarm is set, the chip won't switch back automatically to 2.5Gbps even if the temperature drops below the alarm threshold. When clearing the alarm the chip adds 2.5Gbps back to the advertisement register. Worth to be mentioned: The temperature is checked only if the link speed is 2.5Gbps. Therefore the chip thinks it's safe to add back the 2.5Gbps mode when the alarm is cleared. What I didn't test is whether it's possible to manually add 2.5Gbps to the advertisement register whilst the alarm is set. But I assume that's the case. > Also, does HWMON support clearing alarms? Writing a 0 to the file? Or > are they supported to self clear on read? > Documentation/hwmon/sysfs-interface.rst states that the alarm is a read-only attribute: +-------------------------------+-----------------------+ | **`in[0-*]_alarm`, | Channel alarm | | `curr[1-*]_alarm`, | | | `power[1-*]_alarm`, | - 0: no alarm | | `fan[1-*]_alarm`, | - 1: alarm | | `temp[1-*]_alarm`** | | | | RO | +-------------------------------+-----------------------+ Self-clearing is neither mentioned in the documentation nor implemented in hwmon core. @Guenter: If alarm would just mean "current value > alarm threshold", then we wouldn't need an extra alarm attribute, as this is something which can be checked in user space. Has it ever been considered that a user may have to explicitly ack an alarm to clear it? Would you consider it an ABI violation if alarm is configured as R/W for being able to clear the alarm? > I'm wondering if we are heading towards ABI issues? You have defined: > > - over-temp alarm remains set, even if temperature drops below threshold > > so that kind of eliminates the possibility of implementing self > clearing any time in the future. Explicit clearing via a write is > probably O.K, because the user needs to take an explicit action. Are > there other ABI issues i have not thought about. > > Andrew Heiner
On 1/10/25 13:10, Andrew Lunn wrote: >> - over-temp alarm remains set, even if temperature drops below threshold > >> +int rtl822x_hwmon_init(struct phy_device *phydev) >> +{ >> + struct device *hwdev, *dev = &phydev->mdio.dev; >> + const char *name; >> + >> + /* Ensure over-temp alarm is reset. */ >> + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); > > So it is possible to clear the alarm. > > I know you wanted to experiment with this some more.... > > If the alarm is still set, does that prevent the PHY renegotiating the > higher link speed? If you clear the alarm, does that allow it to > renegotiate the higher link speed? Or is a down/up still required? > Does an down/up clear the alarm if the temperature is below the > threshold? > > Also, does HWMON support clearing alarms? Writing a 0 to the file? Or > are they supported to self clear on read? > Alarm attributes are supposed to self clear on read unless the condition persists. > I'm wondering if we are heading towards ABI issues? You have defined: > > - over-temp alarm remains set, even if temperature drops below threshold > > so that kind of eliminates the possibility of implementing self > clearing any time in the future. Explicit clearing via a write is > probably O.K, because the user needs to take an explicit action. Are > there other ABI issues i have not thought about. > Alarm attributes are supposed to be read-only. Guenter
On 1/10/25 13:41, Heiner Kallweit wrote: > On 10.01.2025 22:10, Andrew Lunn wrote: >>> - over-temp alarm remains set, even if temperature drops below threshold >> >>> +int rtl822x_hwmon_init(struct phy_device *phydev) >>> +{ >>> + struct device *hwdev, *dev = &phydev->mdio.dev; >>> + const char *name; >>> + >>> + /* Ensure over-temp alarm is reset. */ >>> + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); >> >> So it is possible to clear the alarm. >> >> I know you wanted to experiment with this some more.... >> >> If the alarm is still set, does that prevent the PHY renegotiating the >> higher link speed? If you clear the alarm, does that allow it to >> renegotiate the higher link speed? Or is a down/up still required? >> Does an down/up clear the alarm if the temperature is below the >> threshold? >> > I tested wrt one of your previous questions, when exceeding the > temperature threshold the chip actually removes 2.5Gbps from the > advertisement register. > > If the alarm is set, the chip won't switch back automatically to > 2.5Gbps even if the temperature drops below the alarm threshold. > > When clearing the alarm the chip adds 2.5Gbps back to the advertisement > register. Worth to be mentioned: > The temperature is checked only if the link speed is 2.5Gbps. > Therefore the chip thinks it's safe to add back the 2.5Gbps mode > when the alarm is cleared. > > What I didn't test is whether it's possible to manually add 2.5Gbps > to the advertisement register whilst the alarm is set. > But I assume that's the case. > >> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or >> are they supported to self clear on read? >> > Documentation/hwmon/sysfs-interface.rst states that the alarm > is a read-only attribute: > > +-------------------------------+-----------------------+ > | **`in[0-*]_alarm`, | Channel alarm | > | `curr[1-*]_alarm`, | | > | `power[1-*]_alarm`, | - 0: no alarm | > | `fan[1-*]_alarm`, | - 1: alarm | > | `temp[1-*]_alarm`** | | > | | RO | > +-------------------------------+-----------------------+ > > Self-clearing is neither mentioned in the documentation nor > implemented in hwmon core. I would argue that self clearing is implied in "RO". This isn't a hwmon core problem, it needs to be implemented in drivers. Many chips auto-clear alarm attributes on read. For those this is automatic. Others need to explicitly implement clearing alarms. > > @Guenter: > If alarm would just mean "current value > alarm threshold", then we > wouldn't need an extra alarm attribute, as this is something which > can be checked in user space. Alarm attributes, if implemented properly and if a chip supports interrupts, should generate sysfs and udev events to inform userspace. An alarm doesn't just mean "current value > alarm threshold", it can also mean that the current value was above the threshold at some point since the attribute was read the last time. For that to work, the attribute must be sticky until read. FWIW, I am sure you'll find lots of drivers not implementing this properly, so there is no need to search for those and use them as precedent. If you want to support alarm attributes or not is obviously your call, but they should be self clearing if implemented. I don't want to get complaints along the line of "the alarm attribute is set but doesn't clear even though the temperature (or voltage, or whatever) is below the threshold". > Has it ever been considered that a user may have to explicitly ack > an alarm to clear it? Would you consider it an ABI violation if > alarm is configured as R/W for being able to clear the alarm? > Yes. Guenter >> I'm wondering if we are heading towards ABI issues? You have defined: >> >> - over-temp alarm remains set, even if temperature drops below threshold >> >> so that kind of eliminates the possibility of implementing self >> clearing any time in the future. Explicit clearing via a write is >> probably O.K, because the user needs to take an explicit action. Are >> there other ABI issues i have not thought about. >> >> Andrew > > Heiner
Hi Heiner, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Heiner-Kallweit/net-phy-realtek-add-support-for-reading-MDIO_MMD_VEND2-regs-on-RTL8125-RTL8126/20250110-195043 base: net-next/main patch link: https://lore.kernel.org/r/dbfeb139-808f-4345-afe8-830b7f4da26a%40gmail.com patch subject: [PATCH net-next 3/3] net: phy: realtek: add hwmon support for temp sensor on RTL822x config: parisc-randconfig-r072-20250111 (https://download.01.org/0day-ci/archive/20250111/202501111618.Lro85HiT-lkp@intel.com/config) compiler: hppa-linux-gcc (GCC) 14.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250111/202501111618.Lro85HiT-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501111618.Lro85HiT-lkp@intel.com/ All error/warnings (new ones prefixed by >>): >> drivers/net/phy/realtek_hwmon.c:3:10: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'support' 3 | * HWMON support for Realtek PHY's | ^~~~~~~ >> drivers/net/phy/realtek_hwmon.c:3:33: warning: missing terminating ' character 3 | * HWMON support for Realtek PHY's | ^ >> drivers/net/phy/realtek_hwmon.c:3:33: error: missing terminating ' character 3 | * HWMON support for Realtek PHY's | ^~ >> drivers/net/phy/realtek_hwmon.c:5:39: error: stray '@' in program 5 | * Author: Heiner Kallweit <hkallweit1@gmail.com> | ^ In file included from include/uapi/asm-generic/types.h:7, from ./arch/parisc/include/generated/uapi/asm/types.h:1, from include/linux/bitops.h:5, from include/linux/hwmon.h:15, from drivers/net/phy/realtek_hwmon.c:8: >> include/asm-generic/int-ll64.h:16:9: error: unknown type name '__s8'; did you mean '__u8'? 16 | typedef __s8 s8; | ^~~~ | __u8 In file included from include/linux/kernel.h:27, from arch/parisc/include/asm/bug.h:5, from include/linux/bug.h:5, from include/linux/thread_info.h:13, from include/asm-generic/preempt.h:5, from ./arch/parisc/include/generated/asm/preempt.h:1, from include/linux/preempt.h:79, from include/linux/spinlock.h:56, from include/linux/phy.h:15, from drivers/net/phy/realtek_hwmon.c:9: >> include/linux/math.h:112:9: error: unknown type name '__s8'; did you mean '__u8'? 112 | __##type numerator; \ | ^~ include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT' 115 | __STRUCT_FRACT(s8) | ^~~~~~~~~~~~~~ include/linux/math.h:113:9: error: unknown type name '__s8'; did you mean '__u8'? 113 | __##type denominator; \ | ^~ include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT' 115 | __STRUCT_FRACT(s8) | ^~~~~~~~~~~~~~ In file included from include/linux/quota.h:42, from include/linux/fs.h:271, from include/linux/compat.h:17, from include/linux/ethtool.h:17, from include/linux/phy.h:16: >> include/uapi/linux/dqblk_xfs.h:54:9: error: unknown type name '__s8'; did you mean '__u8'? 54 | __s8 d_version; /* version of this structure */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:55:9: error: unknown type name '__s8'; did you mean '__u8'? 55 | __s8 d_flags; /* FS_{USER,PROJ,GROUP}_QUOTA */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:70:9: error: unknown type name '__s8'; did you mean '__u8'? 70 | __s8 d_itimer_hi; /* upper 8 bits of timer values */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:71:9: error: unknown type name '__s8'; did you mean '__u8'? 71 | __s8 d_btimer_hi; | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:72:9: error: unknown type name '__s8'; did you mean '__u8'? 72 | __s8 d_rtbtimer_hi; | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:73:9: error: unknown type name '__s8'; did you mean '__u8'? 73 | __s8 d_padding2; /* padding2 - for future use */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:166:9: error: unknown type name '__s8'; did you mean '__u8'? 166 | __s8 qs_version; /* version number for future changes */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:168:9: error: unknown type name '__s8'; did you mean '__u8'? 168 | __s8 qs_pad; /* unused */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:210:9: error: unknown type name '__s8'; did you mean '__u8'? 210 | __s8 qs_version; /* version for future changes */ | ^~~~ | __u8 In file included from include/linux/ethtool.h:20: >> include/uapi/linux/ethtool.h:2523:9: error: unknown type name '__s8'; did you mean '__u8'? 2523 | __s8 link_mode_masks_nwords; | ^~~~ | __u8 -- realtek_hwmon.c:3:10: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'support' 3 | * HWMON support for Realtek PHY's | ^~~~~~~ realtek_hwmon.c:3:33: warning: missing terminating ' character 3 | * HWMON support for Realtek PHY's | ^ realtek_hwmon.c:3:33: error: missing terminating ' character 3 | * HWMON support for Realtek PHY's | ^~ realtek_hwmon.c:5:39: error: stray '@' in program 5 | * Author: Heiner Kallweit <hkallweit1@gmail.com> | ^ In file included from include/uapi/asm-generic/types.h:7, from arch/parisc/include/generated/uapi/asm/types.h:1, from include/linux/bitops.h:5, from include/linux/hwmon.h:15, from realtek_hwmon.c:8: >> include/asm-generic/int-ll64.h:16:9: error: unknown type name '__s8'; did you mean '__u8'? 16 | typedef __s8 s8; | ^~~~ | __u8 In file included from include/linux/kernel.h:27, from arch/parisc/include/asm/bug.h:5, from include/linux/bug.h:5, from include/linux/thread_info.h:13, from include/asm-generic/preempt.h:5, from arch/parisc/include/generated/asm/preempt.h:1, from include/linux/preempt.h:79, from include/linux/spinlock.h:56, from include/linux/phy.h:15, from realtek_hwmon.c:9: >> include/linux/math.h:112:9: error: unknown type name '__s8'; did you mean '__u8'? 112 | __##type numerator; \ | ^~ include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT' 115 | __STRUCT_FRACT(s8) | ^~~~~~~~~~~~~~ include/linux/math.h:113:9: error: unknown type name '__s8'; did you mean '__u8'? 113 | __##type denominator; \ | ^~ include/linux/math.h:115:1: note: in expansion of macro '__STRUCT_FRACT' 115 | __STRUCT_FRACT(s8) | ^~~~~~~~~~~~~~ In file included from include/linux/quota.h:42, from include/linux/fs.h:271, from include/linux/compat.h:17, from include/linux/ethtool.h:17, from include/linux/phy.h:16: >> include/uapi/linux/dqblk_xfs.h:54:9: error: unknown type name '__s8'; did you mean '__u8'? 54 | __s8 d_version; /* version of this structure */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:55:9: error: unknown type name '__s8'; did you mean '__u8'? 55 | __s8 d_flags; /* FS_{USER,PROJ,GROUP}_QUOTA */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:70:9: error: unknown type name '__s8'; did you mean '__u8'? 70 | __s8 d_itimer_hi; /* upper 8 bits of timer values */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:71:9: error: unknown type name '__s8'; did you mean '__u8'? 71 | __s8 d_btimer_hi; | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:72:9: error: unknown type name '__s8'; did you mean '__u8'? 72 | __s8 d_rtbtimer_hi; | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:73:9: error: unknown type name '__s8'; did you mean '__u8'? 73 | __s8 d_padding2; /* padding2 - for future use */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:166:9: error: unknown type name '__s8'; did you mean '__u8'? 166 | __s8 qs_version; /* version number for future changes */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:168:9: error: unknown type name '__s8'; did you mean '__u8'? 168 | __s8 qs_pad; /* unused */ | ^~~~ | __u8 include/uapi/linux/dqblk_xfs.h:210:9: error: unknown type name '__s8'; did you mean '__u8'? 210 | __s8 qs_version; /* version for future changes */ | ^~~~ | __u8 In file included from include/linux/ethtool.h:20: >> include/uapi/linux/ethtool.h:2523:9: error: unknown type name '__s8'; did you mean '__u8'? 2523 | __s8 link_mode_masks_nwords; | ^~~~ | __u8 vim +3 drivers/net/phy/realtek_hwmon.c > 3 * HWMON support for Realtek PHY's 4 * > 5 * Author: Heiner Kallweit <hkallweit1@gmail.com> 6 */ 7
On 11.01.2025 01:20, Guenter Roeck wrote: > On 1/10/25 13:41, Heiner Kallweit wrote: >> On 10.01.2025 22:10, Andrew Lunn wrote: >>>> - over-temp alarm remains set, even if temperature drops below threshold >>> >>>> +int rtl822x_hwmon_init(struct phy_device *phydev) >>>> +{ >>>> + struct device *hwdev, *dev = &phydev->mdio.dev; >>>> + const char *name; >>>> + >>>> + /* Ensure over-temp alarm is reset. */ >>>> + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); >>> >>> So it is possible to clear the alarm. >>> >>> I know you wanted to experiment with this some more.... >>> >>> If the alarm is still set, does that prevent the PHY renegotiating the >>> higher link speed? If you clear the alarm, does that allow it to >>> renegotiate the higher link speed? Or is a down/up still required? >>> Does an down/up clear the alarm if the temperature is below the >>> threshold? >>> >> I tested wrt one of your previous questions, when exceeding the >> temperature threshold the chip actually removes 2.5Gbps from the >> advertisement register. >> >> If the alarm is set, the chip won't switch back automatically to >> 2.5Gbps even if the temperature drops below the alarm threshold. >> >> When clearing the alarm the chip adds 2.5Gbps back to the advertisement >> register. Worth to be mentioned: >> The temperature is checked only if the link speed is 2.5Gbps. >> Therefore the chip thinks it's safe to add back the 2.5Gbps mode >> when the alarm is cleared. >> >> What I didn't test is whether it's possible to manually add 2.5Gbps >> to the advertisement register whilst the alarm is set. >> But I assume that's the case. >> >>> Also, does HWMON support clearing alarms? Writing a 0 to the file? Or >>> are they supported to self clear on read? >>> >> Documentation/hwmon/sysfs-interface.rst states that the alarm >> is a read-only attribute: >> >> +-------------------------------+-----------------------+ >> | **`in[0-*]_alarm`, | Channel alarm | >> | `curr[1-*]_alarm`, | | >> | `power[1-*]_alarm`, | - 0: no alarm | >> | `fan[1-*]_alarm`, | - 1: alarm | >> | `temp[1-*]_alarm`** | | >> | | RO | >> +-------------------------------+-----------------------+ >> >> Self-clearing is neither mentioned in the documentation nor >> implemented in hwmon core. > > I would argue that self clearing is implied in "RO". This isn't a hwmon > core problem, it needs to be implemented in drivers. Many chips auto-clear > alarm attributes on read. For those this is automatic. Others need > to explicitly implement clearing alarms. > Thanks a lot for the clarifications. Wrt RO and self-clearing see following snippet from a PHY datasheet. These namings are quite common IMO. I think using RC in the alarm attribute description would be clearer. Type Description LH Latch high. If the status is high, this field is set to ‘1’ and remains set. RC Read-cleared. The register field is cleared after read. RO Read only. WO Write only. RW Read and Write. SC Self-cleared. Writing a ‘1’ to this register field causes the function to be activated immediately, and then the field will be automatically cleared to ‘0’. >> >> @Guenter: >> If alarm would just mean "current value > alarm threshold", then we >> wouldn't need an extra alarm attribute, as this is something which >> can be checked in user space. > > Alarm attributes, if implemented properly and if a chip supports interrupts, > should generate sysfs and udev events to inform userspace. An alarm > doesn't just mean "current value > alarm threshold", it can also mean that > the current value was above the threshold at some point since the attribute > was read the last time. For that to work, the attribute must be sticky > until read. > > FWIW, I am sure you'll find lots of drivers not implementing this properly, > so there is no need to search for those and use them as precedent. > > If you want to support alarm attributes or not is obviously your call, > but they should be self clearing if implemented. I don't want to get complaints > along the line of "the alarm attribute is set but doesn't clear even though > the temperature (or voltage, or whatever) is below the threshold". > >> Has it ever been considered that a user may have to explicitly ack >> an alarm to clear it? Would you consider it an ABI violation if >> alarm is configured as R/W for being able to clear the alarm? >> > > Yes. > > Guenter > >>> I'm wondering if we are heading towards ABI issues? You have defined: >>> >>> - over-temp alarm remains set, even if temperature drops below threshold >>> >>> so that kind of eliminates the possibility of implementing self >>> clearing any time in the future. Explicit clearing via a write is >>> probably O.K, because the user needs to take an explicit action. Are >>> there other ABI issues i have not thought about. >>> >>> Andrew >> >> Heiner >
Hi Heiner, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Heiner-Kallweit/net-phy-realtek-add-support-for-reading-MDIO_MMD_VEND2-regs-on-RTL8125-RTL8126/20250110-195043 base: net-next/main patch link: https://lore.kernel.org/r/dbfeb139-808f-4345-afe8-830b7f4da26a%40gmail.com patch subject: [PATCH net-next 3/3] net: phy: realtek: add hwmon support for temp sensor on RTL822x config: i386-randconfig-013-20250111 (https://download.01.org/0day-ci/archive/20250111/202501111711.i1W0TGwl-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250111/202501111711.i1W0TGwl-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501111711.i1W0TGwl-lkp@intel.com/ All errors (new ones prefixed by >>): >> drivers/net/phy/realtek_hwmon.c:3:4: error: type specifier missing, defaults to 'int'; ISO C99 and later do not support implicit int [-Wimplicit-int] 2 | * | int 3 | * HWMON support for Realtek PHY's | ^ >> drivers/net/phy/realtek_hwmon.c:3:9: error: expected ';' after top level declarator 3 | * HWMON support for Realtek PHY's | ^ | ; drivers/net/phy/realtek_hwmon.c:3:33: warning: missing terminating ' character [-Winvalid-pp-token] 3 | * HWMON support for Realtek PHY's | ^ In file included from drivers/net/phy/realtek_hwmon.c:8: In file included from include/linux/hwmon.h:15: In file included from include/linux/bitops.h:5: In file included from ./arch/x86/include/generated/uapi/asm/types.h:1: In file included from include/uapi/asm-generic/types.h:7: include/asm-generic/int-ll64.h:16:9: error: unknown type name '__s8'; did you mean '__u8'? 16 | typedef __s8 s8; | ^~~~ | __u8 include/uapi/asm-generic/int-ll64.h:21:23: note: '__u8' declared here 21 | typedef unsigned char __u8; | ^ In file included from drivers/net/phy/realtek_hwmon.c:9: In file included from include/linux/phy.h:15: In file included from include/linux/spinlock.h:60: In file included from include/linux/thread_info.h:60: In file included from arch/x86/include/asm/thread_info.h:59: In file included from arch/x86/include/asm/cpufeature.h:5: In file included from arch/x86/include/asm/processor.h:35: In file included from include/linux/math64.h:6: include/linux/math.h:115:1: error: unknown type name '__s8'; did you mean '__u8'? 115 | __STRUCT_FRACT(s8) | ^ include/linux/math.h:112:2: note: expanded from macro '__STRUCT_FRACT' 112 | __##type numerator; \ | ^ <scratch space>:221:1: note: expanded from here 221 | __s8 | ^ include/uapi/asm-generic/int-ll64.h:21:23: note: '__u8' declared here 21 | typedef unsigned char __u8; | ^ In file included from drivers/net/phy/realtek_hwmon.c:9: In file included from include/linux/phy.h:15: In file included from include/linux/spinlock.h:60: In file included from include/linux/thread_info.h:60: In file included from arch/x86/include/asm/thread_info.h:59: In file included from arch/x86/include/asm/cpufeature.h:5: In file included from arch/x86/include/asm/processor.h:35: In file included from include/linux/math64.h:6: include/linux/math.h:115:1: error: unknown type name '__s8'; did you mean '__u8'? 115 | __STRUCT_FRACT(s8) | ^ include/linux/math.h:113:2: note: expanded from macro '__STRUCT_FRACT' 113 | __##type denominator; \ | ^ <scratch space>:222:1: note: expanded from here 222 | __s8 | ^ include/uapi/asm-generic/int-ll64.h:21:23: note: '__u8' declared here 21 | typedef unsigned char __u8; | ^ In file included from drivers/net/phy/realtek_hwmon.c:9: In file included from include/linux/phy.h:16: In file included from include/linux/ethtool.h:17: In file included from include/linux/compat.h:17: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:98:11: warning: array index 3 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 98 | return (set->sig[3] | set->sig[2] | | ^ ~ arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here 24 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from drivers/net/phy/realtek_hwmon.c:9: In file included from include/linux/phy.h:16: In file included from include/linux/ethtool.h:17: In file included from include/linux/compat.h:17: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:98:25: warning: array index 2 is past the end of the array (that has type 'unsigned long[2]') [-Warray-bounds] 98 | return (set->sig[3] | set->sig[2] | | ^ ~ arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here 24 | unsigned long sig[_NSIG_WORDS]; | ^ In file included from drivers/net/phy/realtek_hwmon.c:9: In file included from include/linux/phy.h:16: In file included from include/linux/ethtool.h:17: In file included from include/linux/compat.h:17: In file included from include/linux/fs.h:33: In file included from include/linux/percpu-rwsem.h:7: In file included from include/linux/rcuwait.h:6: In file included from include/linux/sched/signal.h:6: include/linux/signal.h:114:11: warning: array index 3 is past the end of the array (that has type 'const unsigned long[2]') [-Warray-bounds] 114 | return (set1->sig[3] == set2->sig[3]) && | ^ ~ arch/x86/include/asm/signal.h:24:2: note: array 'sig' declared here 24 | unsigned long sig[_NSIG_WORDS]; | ^ vim +/int +3 drivers/net/phy/realtek_hwmon.c > 3 * HWMON support for Realtek PHY's 4 * 5 * Author: Heiner Kallweit <hkallweit1@gmail.com> 6 */ 7
On 10.01.2025 22:10, Andrew Lunn wrote: >> - over-temp alarm remains set, even if temperature drops below threshold > >> +int rtl822x_hwmon_init(struct phy_device *phydev) >> +{ >> + struct device *hwdev, *dev = &phydev->mdio.dev; >> + const char *name; >> + >> + /* Ensure over-temp alarm is reset. */ >> + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); > > So it is possible to clear the alarm. > > I know you wanted to experiment with this some more.... > > If the alarm is still set, does that prevent the PHY renegotiating the > higher link speed? If you clear the alarm, does that allow it to > renegotiate the higher link speed? Or is a down/up still required? > Does an down/up clear the alarm if the temperature is below the > threshold? > > Also, does HWMON support clearing alarms? Writing a 0 to the file? Or > are they supported to self clear on read? > > I'm wondering if we are heading towards ABI issues? You have defined: > > - over-temp alarm remains set, even if temperature drops below threshold > > so that kind of eliminates the possibility of implementing self > clearing any time in the future. Explicit clearing via a write is > probably O.K, because the user needs to take an explicit action. Are > there other ABI issues i have not thought about. > According to Guenters feedback the alarm attribute must not be written and is expected to be self-clearing on read. If we would clear the alarm in the chip on alarm attribute read, then we can have the following ugly scenario: 1. Temperature threshold is exceeded and chip reduces speed to 1Gbps 2. Temperature is falling below alarm threshold 3. User uses "sensors" to check the current temperature 4. The implicit alarm attribute read causes the chip to clear the alarm and re-enable 2.5Gbps speed, resulting in the temperature alarm threshold being exceeded very soon again. What isn't nice here is that it's not transparent to the user that a read-only command from his perspective causes the protective measure of the chip to be cancelled. There's no existing hwmon attribute meant to be used by the user to clear a hw alarm once he took measures to protect the chip from overheating. > Andrew Heiner
> According to Guenters feedback the alarm attribute must not be written > and is expected to be self-clearing on read. > If we would clear the alarm in the chip on alarm attribute read, then > we can have the following ugly scenario: > > 1. Temperature threshold is exceeded and chip reduces speed to 1Gbps > 2. Temperature is falling below alarm threshold > 3. User uses "sensors" to check the current temperature > 4. The implicit alarm attribute read causes the chip to clear the > alarm and re-enable 2.5Gbps speed, resulting in the temperature > alarm threshold being exceeded very soon again. > > What isn't nice here is that it's not transparent to the user that > a read-only command from his perspective causes the protective measure > of the chip to be cancelled. > > There's no existing hwmon attribute meant to be used by the user > to clear a hw alarm once he took measures to protect the chip > from overheating. It is generally not the kernels job to implement policy. User space should be doing that. I see two different possible policies, and there are maybe others: 1) The user is happy with one second outages every so often as the chip cycles between too hot and down shifting, and cool enough to upshift back to the higher speeds. 2) The user prefers to have reliable, slower connectivity and needs to explicitly do something like down/up the interface to get it back to the higher speed. I personally would say, from a user support view, 2) is better. A one time 1 second break in connectivity and a kernel message is going to cause less issues. Maybe the solution is that the hwmon alarm attribute is not directly the hardware bit, but a software interpretation of the system state. When the alarm fires, copy it into a software alarm state, but leave the hardware alarm alone. A hwmon read clears the software state, but leaves the hardware alone. A down/up of the interface will then clear both the software and hardware alarm state. Anybody wanting policy 1) would then need a daemon polling the state and taking action. 2) would be the default. How easy is it for you to get into the alarm state? Did you need an environment chamber/oven, or is it happening for you with just lots of continuous traffic at typical room temperature? Are we talking about cheap USB dangles in a sealed plastic case with poor thermal design are going to be doing this all the time? Andrew
On 11.01.2025 18:00, Andrew Lunn wrote: >> According to Guenters feedback the alarm attribute must not be written >> and is expected to be self-clearing on read. >> If we would clear the alarm in the chip on alarm attribute read, then >> we can have the following ugly scenario: >> >> 1. Temperature threshold is exceeded and chip reduces speed to 1Gbps >> 2. Temperature is falling below alarm threshold >> 3. User uses "sensors" to check the current temperature >> 4. The implicit alarm attribute read causes the chip to clear the >> alarm and re-enable 2.5Gbps speed, resulting in the temperature >> alarm threshold being exceeded very soon again. >> >> What isn't nice here is that it's not transparent to the user that >> a read-only command from his perspective causes the protective measure >> of the chip to be cancelled. >> >> There's no existing hwmon attribute meant to be used by the user >> to clear a hw alarm once he took measures to protect the chip >> from overheating. > > It is generally not the kernels job to implement policy. User space > should be doing that. > > I see two different possible policies, and there are maybe others: > > 1) The user is happy with one second outages every so often as the > chip cycles between too hot and down shifting, and cool enough to > upshift back to the higher speeds. > > 2) The user prefers to have reliable, slower connectivity and needs to > explicitly do something like down/up the interface to get it back to > the higher speed. > This seems to be exactly how I do it currently. > I personally would say, from a user support view, 2) is better. A one > time 1 second break in connectivity and a kernel message is going to > cause less issues. > > Maybe the solution is that the hwmon alarm attribute is not directly > the hardware bit, but a software interpretation of the system state. > When the alarm fires, copy it into a software alarm state, but leave > the hardware alarm alone. A hwmon read clears the software state, but > leaves the hardware alone. A down/up of the interface will then clear > both the software and hardware alarm state. > Not clearing the alarm on read is better from a user perspective IMO (at least for this specific PHY). As long as the alarm is active, the chip forces a downshift. > Anybody wanting policy 1) would then need a daemon polling the state > and taking action. 2) would be the default. > > How easy is it for you to get into the alarm state? Did you need an > environment chamber/oven, or is it happening for you with just lots of > continuous traffic at typical room temperature? Are we talking about > cheap USB dangles in a sealed plastic case with poor thermal design > are going to be doing this all the time? > I have a M.2 card with RTL8126 (w/o heat sink) and an external RJ45 port. This card sits in a slot underneath the mainboard of a mini PC. At 2.5Gbps it makes a big difference whether EEE is active. With EEE it reaches 54°C, w/o EEE temperature quickly goes over 70°C. For tests I add a PHY write to the code which sets the over-temp threshold to 60°C. Then I can easily trigger overheating by disabling EEE. On my system the over-temp threshold set by the BIOS (?) is 120°C. Even w/o heat sink I can hardly imagine that this threshold is ever reached. > Andrew Heiner
> On my system the over-temp threshold set by the BIOS (?) is 120°C. > Even w/o heat sink I can hardly imagine that this threshold is ever > reached. So this is to some extent a theoretical problem, in your setup. So i would not spend too much time on it. Andrew
On 11.01.2025 18:44, Andrew Lunn wrote: >> On my system the over-temp threshold set by the BIOS (?) is 120°C. >> Even w/o heat sink I can hardly imagine that this threshold is ever >> reached. > > So this is to some extent a theoretical problem, in your setup. So i > would not spend too much time on it. > Yes, I think I'll omit the alarm feature and just expose current temperature and over-temp threshold. > Andrew Heiner
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig index dc625f2b3..eae53ef88 100644 --- a/drivers/net/phy/Kconfig +++ b/drivers/net/phy/Kconfig @@ -355,6 +355,12 @@ config REALTEK_PHY help Supports the Realtek 821x PHY. +config REALTEK_PHY_HWMON + def_bool REALTEK_PHY && HWMON + depends on !(REALTEK_PHY=y && HWMON=m) + help + Optional hwmon support for the temperature sensor + config RENESAS_PHY tristate "Renesas PHYs" help diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile index ec480e733..ca369a5c9 100644 --- a/drivers/net/phy/Makefile +++ b/drivers/net/phy/Makefile @@ -96,6 +96,7 @@ obj-$(CONFIG_NXP_TJA11XX_PHY) += nxp-tja11xx.o obj-y += qcom/ obj-$(CONFIG_QSEMI_PHY) += qsemi.o realtek-y += realtek_main.o +realtek-$(CONFIG_REALTEK_PHY_HWMON) += realtek_hwmon.o obj-$(CONFIG_REALTEK_PHY) += realtek.o obj-$(CONFIG_RENESAS_PHY) += uPD60620.o obj-$(CONFIG_ROCKCHIP_PHY) += rockchip.o diff --git a/drivers/net/phy/realtek.h b/drivers/net/phy/realtek.h new file mode 100644 index 000000000..a39b44fa1 --- /dev/null +++ b/drivers/net/phy/realtek.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef REALTEK_H +#define REALTEK_H + +#include <linux/phy.h> + +int rtl822x_hwmon_init(struct phy_device *phydev); + +#endif /* REALTEK_H */ diff --git a/drivers/net/phy/realtek_hwmon.c b/drivers/net/phy/realtek_hwmon.c new file mode 100644 index 000000000..d3284c83f --- /dev/null +++ b/drivers/net/phy/realtek_hwmon.c @@ -0,0 +1,83 @@ +// SPDX-License-Identifier: GPL-2.0+ + * + * HWMON support for Realtek PHY's + * + * Author: Heiner Kallweit <hkallweit1@gmail.com> + */ + +#include <linux/hwmon.h> +#include <linux/phy.h> + +#include "realtek.h" + +#define RTL822X_VND2_TSALRM 0xa662 +#define RTL822X_VND2_TSRR 0xbd84 +#define RTL822X_VND2_TSSR 0xb54c + +static int rtl822x_hwmon_get_temp(int raw) +{ + if (raw >= 512) + raw -= 1024; + + return 1000 * raw / 2; +} + +static int rtl822x_hwmon_read(struct device *dev, enum hwmon_sensor_types type, + u32 attr, int channel, long *val) +{ + struct phy_device *phydev = dev_get_drvdata(dev); + int raw; + + switch (attr) { + case hwmon_temp_input: + raw = phy_read_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSRR) & 0x3ff; + *val = rtl822x_hwmon_get_temp(raw); + break; + case hwmon_temp_max: + /* Chip reduces speed to 1G if threshold is exceeded */ + raw = phy_read_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSSR) >> 6; + *val = rtl822x_hwmon_get_temp(raw); + break; + case hwmon_temp_alarm: + raw = phy_read_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM); + *val = !!(raw & 3); + break; + default: + return -EINVAL; + } + + return 0; +} + +static const struct hwmon_ops rtl822x_hwmon_ops = { + .visible = 0444, + .read = rtl822x_hwmon_read, +}; + +static const struct hwmon_channel_info * const rtl822x_hwmon_info[] = { + HWMON_CHANNEL_INFO(temp, HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_ALARM), + NULL +}; + +static const struct hwmon_chip_info rtl822x_hwmon_chip_info = { + .ops = &rtl822x_hwmon_ops, + .info = rtl822x_hwmon_info, +}; + +int rtl822x_hwmon_init(struct phy_device *phydev) +{ + struct device *hwdev, *dev = &phydev->mdio.dev; + const char *name; + + /* Ensure over-temp alarm is reset. */ + phy_clear_bits_mmd(phydev, MDIO_MMD_VEND2, RTL822X_VND2_TSALRM, 3); + + name = devm_hwmon_sanitize_name(dev, dev_name(dev)); + if (IS_ERR(name)) + return PTR_ERR(name); + + hwdev = devm_hwmon_device_register_with_info(dev, name, phydev, + &rtl822x_hwmon_chip_info, + NULL); + return PTR_ERR_OR_ZERO(hwdev); +} diff --git a/drivers/net/phy/realtek_main.c b/drivers/net/phy/realtek_main.c index af9874143..38149958d 100644 --- a/drivers/net/phy/realtek_main.c +++ b/drivers/net/phy/realtek_main.c @@ -14,6 +14,8 @@ #include <linux/delay.h> #include <linux/clk.h> +#include "realtek.h" + #define RTL821x_PHYSR 0x11 #define RTL821x_PHYSR_DUPLEX BIT(13) #define RTL821x_PHYSR_SPEED GENMASK(15, 14) @@ -820,6 +822,15 @@ static int rtl822x_write_mmd(struct phy_device *phydev, int devnum, u16 regnum, return ret; } +static int rtl822x_probe(struct phy_device *phydev) +{ + if (IS_ENABLED(CONFIG_REALTEK_PHY_HWMON) && + phydev->phy_id != RTL_GENERIC_PHYID) + return rtl822x_hwmon_init(phydev); + + return 0; +} + static int rtl822xb_config_init(struct phy_device *phydev) { bool has_2500, has_sgmii; @@ -1519,6 +1530,7 @@ static struct phy_driver realtek_drvs[] = { .match_phy_device = rtl_internal_nbaset_match_phy_device, .name = "Realtek Internal NBASE-T PHY", .flags = PHY_IS_INTERNAL, + .probe = rtl822x_probe, .get_features = rtl822x_get_features, .config_aneg = rtl822x_config_aneg, .read_status = rtl822x_read_status,
This adds hwmon support for the temperature sensor on RTL822x. It's available on the standalone versions of the PHY's, and on the integrated PHY's in RTL8125B/RTL8125D/RTL8126. Notes: - over-temp threshold is set by PHY power-on default or BIOS / boot loader and it's read-only - over-temp alarm remains set, even if temperature drops below threshold Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> --- drivers/net/phy/Kconfig | 6 +++ drivers/net/phy/Makefile | 1 + drivers/net/phy/realtek.h | 10 ++++ drivers/net/phy/realtek_hwmon.c | 83 +++++++++++++++++++++++++++++++++ drivers/net/phy/realtek_main.c | 12 +++++ 5 files changed, 112 insertions(+) create mode 100644 drivers/net/phy/realtek.h create mode 100644 drivers/net/phy/realtek_hwmon.c