diff mbox series

[net-next,v2,2/2] net: phy: marvell-88q2xxx: Prevent hwmon access with asserted reset

Message ID 20250220-marvell-88q2xxx-hwmon-enable-at-probe-v2-2-78b2838a62da@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: phy: marvell-88q2xxx: Enable temperature measurement in probe again | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 26 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-20--18-00 (tests: 893)

Commit Message

Dimitri Fedrau Feb. 20, 2025, 8:11 a.m. UTC
If the PHYs reset is asserted it returns 0xffff for any read operation.
This might happen if the user admins down the interface and wants to read
the temperature. Prevent reading the temperature in this case and return
with an network is down error. Write operations are ignored by the device
when reset is asserted, still return a network is down error in this
case to make the user aware of the operation gone wrong.

Signed-off-by: Dimitri Fedrau <dima.fedrau@gmail.com>
---
 drivers/net/phy/marvell-88q2xxx.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Russell King (Oracle) Feb. 20, 2025, 9:36 a.m. UTC | #1
On Thu, Feb 20, 2025 at 09:11:12AM +0100, Dimitri Fedrau wrote:
> If the PHYs reset is asserted it returns 0xffff for any read operation.
> This might happen if the user admins down the interface and wants to read
> the temperature. Prevent reading the temperature in this case and return
> with an network is down error. Write operations are ignored by the device
> when reset is asserted, still return a network is down error in this
> case to make the user aware of the operation gone wrong.

If we look at where mdio_device_reset() is called from:

1. mdio_device_register() -> mdiobus_register_device() asserts reset
   before adding the device to the device layer (which will then
   cause the driver to be searched for and bound.)

2. mdio_probe(), deasserts the reset signal before calling the MDIO
   driver's ->probe method, which will be phy_probe().

3. after a probe failure to re-assert the reset signal.

4. after ->remove has been called.

That is the sum total. So, while the driver is bound to the device,
phydev->mdio.reset_state is guaranteed to be zero.

Therefore, is this patch fixing a real observed problem with the
current driver?
Dimitri Fedrau Feb. 20, 2025, 3:22 p.m. UTC | #2
Hi Russell,

Am Thu, Feb 20, 2025 at 09:36:49AM +0000 schrieb Russell King (Oracle):
> On Thu, Feb 20, 2025 at 09:11:12AM +0100, Dimitri Fedrau wrote:
> > If the PHYs reset is asserted it returns 0xffff for any read operation.
> > This might happen if the user admins down the interface and wants to read
> > the temperature. Prevent reading the temperature in this case and return
> > with an network is down error. Write operations are ignored by the device
> > when reset is asserted, still return a network is down error in this
> > case to make the user aware of the operation gone wrong.
> 
> If we look at where mdio_device_reset() is called from:
> 
> 1. mdio_device_register() -> mdiobus_register_device() asserts reset
>    before adding the device to the device layer (which will then
>    cause the driver to be searched for and bound.)
> 
> 2. mdio_probe(), deasserts the reset signal before calling the MDIO
>    driver's ->probe method, which will be phy_probe().
> 
> 3. after a probe failure to re-assert the reset signal.
> 
> 4. after ->remove has been called.
> 

There is also phy_device_reset that calls mdio_device_reset.

> That is the sum total. So, while the driver is bound to the device,
> phydev->mdio.reset_state is guaranteed to be zero.
> 
> Therefore, is this patch fixing a real observed problem with the
> current driver?
>
Yes, when I admin up and afterwards down the network device then the PHYs
reset is asserted. In this case phy_detach is called which calls
phy_device_reset(phydev, 1), ...

Best regards,
Dimitri Fedrau
Russell King (Oracle) Feb. 20, 2025, 3:29 p.m. UTC | #3
On Thu, Feb 20, 2025 at 04:22:14PM +0100, Dimitri Fedrau wrote:
> Hi Russell,
> 
> Am Thu, Feb 20, 2025 at 09:36:49AM +0000 schrieb Russell King (Oracle):
> > On Thu, Feb 20, 2025 at 09:11:12AM +0100, Dimitri Fedrau wrote:
> > > If the PHYs reset is asserted it returns 0xffff for any read operation.
> > > This might happen if the user admins down the interface and wants to read
> > > the temperature. Prevent reading the temperature in this case and return
> > > with an network is down error. Write operations are ignored by the device
> > > when reset is asserted, still return a network is down error in this
> > > case to make the user aware of the operation gone wrong.
> > 
> > If we look at where mdio_device_reset() is called from:
> > 
> > 1. mdio_device_register() -> mdiobus_register_device() asserts reset
> >    before adding the device to the device layer (which will then
> >    cause the driver to be searched for and bound.)
> > 
> > 2. mdio_probe(), deasserts the reset signal before calling the MDIO
> >    driver's ->probe method, which will be phy_probe().
> > 
> > 3. after a probe failure to re-assert the reset signal.
> > 
> > 4. after ->remove has been called.
> > 
> 
> There is also phy_device_reset that calls mdio_device_reset.

Ok, thanks for pointing that out.

> > That is the sum total. So, while the driver is bound to the device,
> > phydev->mdio.reset_state is guaranteed to be zero.
> > 
> > Therefore, is this patch fixing a real observed problem with the
> > current driver?
> >
> Yes, when I admin up and afterwards down the network device then the PHYs
> reset is asserted. In this case phy_detach is called which calls
> phy_device_reset(phydev, 1), ...

I'm still concerned that this solution is basically racy - the
netdev can come up/down while hwmon is accessing the device. I'm
also unconvinced that ENETDOWN is a good idea here.

While I get the "describe the hardware" is there a real benefit to
describing this?

What I'm wondering is whether manipulating the reset signal in this
case provides more pain than gain.
Dimitri Fedrau Feb. 20, 2025, 9:05 p.m. UTC | #4
Am Thu, Feb 20, 2025 at 03:29:10PM +0000 schrieb Russell King (Oracle):
> On Thu, Feb 20, 2025 at 04:22:14PM +0100, Dimitri Fedrau wrote:
> > Hi Russell,
> > 
> > Am Thu, Feb 20, 2025 at 09:36:49AM +0000 schrieb Russell King (Oracle):
> > > On Thu, Feb 20, 2025 at 09:11:12AM +0100, Dimitri Fedrau wrote:
> > > > If the PHYs reset is asserted it returns 0xffff for any read operation.
> > > > This might happen if the user admins down the interface and wants to read
> > > > the temperature. Prevent reading the temperature in this case and return
> > > > with an network is down error. Write operations are ignored by the device
> > > > when reset is asserted, still return a network is down error in this
> > > > case to make the user aware of the operation gone wrong.
> > > 
> > > If we look at where mdio_device_reset() is called from:
> > > 
> > > 1. mdio_device_register() -> mdiobus_register_device() asserts reset
> > >    before adding the device to the device layer (which will then
> > >    cause the driver to be searched for and bound.)
> > > 
> > > 2. mdio_probe(), deasserts the reset signal before calling the MDIO
> > >    driver's ->probe method, which will be phy_probe().
> > > 
> > > 3. after a probe failure to re-assert the reset signal.
> > > 
> > > 4. after ->remove has been called.
> > > 
> > 
> > There is also phy_device_reset that calls mdio_device_reset.
> 
> Ok, thanks for pointing that out.
> 
> > > That is the sum total. So, while the driver is bound to the device,
> > > phydev->mdio.reset_state is guaranteed to be zero.
> > > 
> > > Therefore, is this patch fixing a real observed problem with the
> > > current driver?
> > >
> > Yes, when I admin up and afterwards down the network device then the PHYs
> > reset is asserted. In this case phy_detach is called which calls
> > phy_device_reset(phydev, 1), ...
> 
> I'm still concerned that this solution is basically racy - the
> netdev can come up/down while hwmon is accessing the device. I'm
> also unconvinced that ENETDOWN is a good idea here.
>
Yes it is racy. A solution would be to set a flag or whatever in
mdio_device_reset right before changes to the reset line happens and
clear the flag right after the changes have been done. Add a function
that return the state of the flag.
Better go back to EIO ? At first I thought it was a good idea because
the user gets at least a hint what the cause of the error is...

> While I get the "describe the hardware" is there a real benefit to
> describing this?
> 
I can't follow.

> What I'm wondering is whether manipulating the reset signal in this
> case provides more pain than gain.
>
It seems so. I wasn't really aware of it :-)

Best regards,
Dimitri Fedrau
diff mbox series

Patch

diff --git a/drivers/net/phy/marvell-88q2xxx.c b/drivers/net/phy/marvell-88q2xxx.c
index 342a909a12a2785ad579656eb369c69acaace9d1..ea9a2a923146bf432a33ff46b606c08debb69a4f 100644
--- a/drivers/net/phy/marvell-88q2xxx.c
+++ b/drivers/net/phy/marvell-88q2xxx.c
@@ -698,6 +698,12 @@  static int mv88q2xxx_hwmon_read(struct device *dev,
 	struct phy_device *phydev = dev_get_drvdata(dev);
 	int ret;
 
+	/* If the PHYs reset is asserted it returns 0xffff for any read
+	 * operation. Return with an network is down error in this case.
+	 */
+	if (phydev->mdio.reset_state == 1)
+		return -ENETDOWN;
+
 	switch (attr) {
 	case hwmon_temp_input:
 		ret = phy_read_mmd(phydev, MDIO_MMD_PCS,
@@ -737,6 +743,14 @@  static int mv88q2xxx_hwmon_write(struct device *dev,
 {
 	struct phy_device *phydev = dev_get_drvdata(dev);
 
+	/* If the PHYs reset is asserted it ignores any write operation, return
+	 * with an network is down error in this case. Without returning an
+	 * error the user would not know that writing the temperature threshold
+	 * has gone wrong.
+	 */
+	if (phydev->mdio.reset_state == 1)
+		return -ENETDOWN;
+
 	switch (attr) {
 	case hwmon_temp_max:
 		clamp_val(val, -75000, 180000);