diff mbox series

[net-next] net: phy: avoid kernel warning dump when stopping an errored PHY

Message ID E1q17vE-007Baz-8c@rmk-PC.armlinux.org.uk (mailing list archive)
State Accepted
Commit 59088b5a946ee8a6603a9a84781670cedb01c40d
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: phy: avoid kernel warning dump when stopping an errored PHY | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 408 this patch: 408
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 287 this patch: 287
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 394 this patch: 394
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 62 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Russell King (Oracle) May 22, 2023, 3:58 p.m. UTC
When taking a network interface down (or removing a SFP module) after
the PHY has encountered an error, phy_stop() complains incorrectly
that it was called from HALTED state.

The reason this is incorrect is that the network driver will have
called phy_start() when the interface was brought up, and the fact
that the PHY has a problem bears no relationship to the administrative
state of the interface. Taking the interface administratively down
(which calls phy_stop()) is always the right thing to do after a
successful phy_start() call, whether or not the PHY has encountered
an error.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
---
 drivers/net/phy/phy.c | 11 +++++++----
 include/linux/phy.h   |  7 +++++--
 2 files changed, 12 insertions(+), 6 deletions(-)

Comments

Russell King (Oracle) May 22, 2023, 4:06 p.m. UTC | #1
On Mon, May 22, 2023 at 04:58:08PM +0100, Russell King (Oracle) wrote:
> When taking a network interface down (or removing a SFP module) after
> the PHY has encountered an error, phy_stop() complains incorrectly
> that it was called from HALTED state.
> 
> The reason this is incorrect is that the network driver will have
> called phy_start() when the interface was brought up, and the fact
> that the PHY has a problem bears no relationship to the administrative
> state of the interface. Taking the interface administratively down
> (which calls phy_stop()) is always the right thing to do after a
> successful phy_start() call, whether or not the PHY has encountered
> an error.

Note that I can reproduce this by repeatedly plugging and unplugging any
SFP with a PHY that we access - if one unplugs it while the PHY is being
accessed, phylib logs an error, and then the unplug event happens which
ends up correctly calling phy_stop(), which then spits out a kernel
warning.

One may suggest that this is an unlikely event, but any SFP using the
Rollball I2C protocol to access the PHY, each access can take tens of
milliseconds, which is more than enough time to hit this.
Florian Fainelli May 22, 2023, 7:03 p.m. UTC | #2
On 5/22/23 08:58, Russell King (Oracle) wrote:
> When taking a network interface down (or removing a SFP module) after
> the PHY has encountered an error, phy_stop() complains incorrectly
> that it was called from HALTED state.
> 
> The reason this is incorrect is that the network driver will have
> called phy_start() when the interface was brought up, and the fact
> that the PHY has a problem bears no relationship to the administrative
> state of the interface. Taking the interface administratively down
> (which calls phy_stop()) is always the right thing to do after a
> successful phy_start() call, whether or not the PHY has encountered
> an error.
> 
> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Acked-by: Florian Fainelli <f.fainelli@gmail.com>

I would argue that the entire phy_error() needs a revamp, yes it's 
important to know if we have an error "talking" to the PHY, but there is 
no much that is being actionable in other situations than pluggable SFP 
modules.
patchwork-bot+netdevbpf@kernel.org May 24, 2023, 7:30 a.m. UTC | #3
Hello:

This patch was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 22 May 2023 16:58:08 +0100 you wrote:
> When taking a network interface down (or removing a SFP module) after
> the PHY has encountered an error, phy_stop() complains incorrectly
> that it was called from HALTED state.
> 
> The reason this is incorrect is that the network driver will have
> called phy_start() when the interface was brought up, and the fact
> that the PHY has a problem bears no relationship to the administrative
> state of the interface. Taking the interface administratively down
> (which calls phy_stop()) is always the right thing to do after a
> successful phy_start() call, whether or not the PHY has encountered
> an error.
> 
> [...]

Here is the summary with links:
  - [net-next] net: phy: avoid kernel warning dump when stopping an errored PHY
    https://git.kernel.org/netdev/net-next/c/59088b5a946e

You are awesome, thank you!
Jijie Shao Sept. 4, 2023, 9:50 a.m. UTC | #4
Hi all,
We encountered an issue when resetting our netdevice recently, it seems
related to this patch.

During our process, we stop phy first and call phy_start() later.
phy_check_link_status returns error because it read mdio failed. The
reason why it happened is that the cmdq is unusable when we reset and we
can't access to mdio.

The process and logs are showed as followed:
Process:
reset process       |    phy_state_machine           |  phy_state
==========================================================================
                     | mutex_lock(&phydev->lock);     | PHY_RUNNING
                     | ...                            |
                     | case PHY_RUNNING:              |
                     | err = phy_check_link_status()  | PHY_RUNNING
                     | ...                            |
                     | mutex_unlock(&phydev->lock)    | PHY_RUNNING
  phy_stop()         |                                |
    ...              |                                |
    mutex_lock()     |                                | PHY_RUNNING
    ...              |                                |
    phydev->state =  |                                |
      PHY_HALTED;    |                                |  PHY_HALTED
    ...              |                                |
    mutex_unlock()   |                                |  PHY_HALTED
                     | phy_error_precise():           |
                     |   mutex_lock(&phydev->lock);   | PHY_HALTED
                     |   phydev->state = PHY_ERROR;   | PHY_ERROR
                     |   mutex_unlock(&phydev->lock); | PHY_ERROR
                     |                                |
phy_start()         |                                |  PHY_ERROR
   ...               |                                |
Logs:
[ 2622.146721] hns3 0000:35:00.0 eth1: Setting reset type 6
[ 2622.155182] hns3 0000:35:00.0: received reset event, reset type is 6
[ 2622.171641] hns3 0000:35:00.0: global reset requested
[ 2622.181867] hns3 0000:35:00.0: global reset interrupt
[ 2623.351382] ------------[ cut here ]------------
[ 2623.358012] phy_check_link_status+0x0/0xe0: returned: -16
[ 2623.370106] hns3 0000:35:00.0 eth1: net stop
[ 2623.377599] WARNING: CPU: 0 PID: 10 at drivers/net/phy/phy.c:1211
phy_state_machine+0xac/0x2b8
[ 2623.386026] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY state
change RUNNING -> HALTED
                 ...
[ 2623.540165] Call trace:
[ 2623.543034]  phy_state_machine+0xac/0x2b8
[ 2623.548028]  process_one_work+0x1ec/0x478
[ 2623.552732]  worker_thread+0x74/0x448
[ 2623.556855]  kthread+0x120/0x130
[ 2623.560920]  ret_from_fork+0x10/0x20
[ 2623.565355] ---[ end trace 0000000000000000 ]---
[ 2623.577722] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY state
change RUNNING -> ERROR
[ 2623.590490] hns3 0000:35:00.0 eth1: link down
[ 2623.707230] hns3 0000:35:00.0: prepare wait ok
[ 2624.169139] hns3 0000:35:00.0: The firmware version is 3.10.11.25
[ 2624.501223] hns3 0000:35:00.0: phc initializes ok!
[ 2624.553486] hns3 0000:35:00.0: Reset done, hclge driver initialization
finished.
[ 2625.586470] ------------[ cut here ]------------
[ 2625.593882] called from state ERROR
[ 2625.600677] WARNING: CPU: 1 PID: 352 at drivers/net/phy/phy.c:1392
phy_start+0x50/0xc8
                 ...
[ 2625.750077] Call trace:
[ 2625.752799]  phy_start+0x50/0xc8
[ 2625.756974]  hclge_mac_start_phy+0x34/0x50 [hclge]
                 ...
[ 2625.831224] ---[ end trace 0000000000000000 ]---
[ 2625.843790] hns3 0000:35:00.0 eth1: net open

We supposed to start phy successfully after calling phy_stop. However, the
phy state is PHY_ERROR. As aboved process, we can find
phy_check_link_status is called before phy_stop, but the final phy state
is set due to an error from phy_check_link_status. Becuase we reset our
netdevice successfully, the phy should not be PHY_ERROR when we call
phy_start. So, we supposed it might be a bug.

Additionally, what can we do if the phy is in PHY_ERROR?

Thanks!
Jijie Shao
Andrew Lunn Sept. 4, 2023, 1:43 p.m. UTC | #5
On Mon, Sep 04, 2023 at 05:50:32PM +0800, Jijie Shao wrote:
> Hi all,
> We encountered an issue when resetting our netdevice recently, it seems
> related to this patch.
> 
> During our process, we stop phy first and call phy_start() later.
> phy_check_link_status returns error because it read mdio failed. The
> reason why it happened is that the cmdq is unusable when we reset and we
> can't access to mdio.

At what point in the flow below do you apply the reset which stops
access to the MDIO bus? Ideally you want to do phy_stop(), then apply
the reset, get the hardware working again, and then do a phy_start().

> 
> The process and logs are showed as followed:
> Process:
> reset process       |    phy_state_machine           |  phy_state
> ==========================================================================
>                     | mutex_lock(&phydev->lock);     | PHY_RUNNING
>                     | ...                            |
>                     | case PHY_RUNNING:              |
>                     | err = phy_check_link_status()  | PHY_RUNNING
>                     | ...                            |
>                     | mutex_unlock(&phydev->lock)    | PHY_RUNNING
>  phy_stop()         |                                |
>    ...              |                                |
>    mutex_lock()     |                                | PHY_RUNNING
>    ...              |                                |
>    phydev->state =  |                                |
>      PHY_HALTED;    |                                |  PHY_HALTED
>    ...              |                                |
>    mutex_unlock()   |                                |  PHY_HALTED
>                     | phy_error_precise():           |
>                     |   mutex_lock(&phydev->lock);   | PHY_HALTED
>                     |   phydev->state = PHY_ERROR;   | PHY_ERROR
>                     |   mutex_unlock(&phydev->lock); | PHY_ERROR
>                     |                                |
> phy_start()         |                                |  PHY_ERROR
>   ...               |                                |


	Andrew
Russell King (Oracle) Sept. 4, 2023, 2:42 p.m. UTC | #6
On Mon, Sep 04, 2023 at 05:50:32PM +0800, Jijie Shao wrote:
> Hi all,
> We encountered an issue when resetting our netdevice recently, it seems
> related to this patch.
> 
> During our process, we stop phy first and call phy_start() later.
> phy_check_link_status returns error because it read mdio failed. The
> reason why it happened is that the cmdq is unusable when we reset and we
> can't access to mdio.

Are you suggesting that the sequence is:

phy_stop();
reset netdev
phy_start();

?

Is the reason for doing this because you've already detected an issue
with the hardware, and you're trying to recover it - and before you've
called phy_stop() the hardware is already dead?

If that is the case, I'm not really sure what you expect to happen
here. You've identified a race where the state machine is running in
unison with phy_stop(), but in this circumstance it is also possible
that the state machine could complete executing and have called
phy_error_precise() before phy_stop() has even been called. In that
case, you'll still get a warning-splat on the console from
phy_error_precise().

The only difference is that phy_stop() won't warn.

That all said, this is obviously buggy, because phy_stop() has set
the phydev state to PHY_HALTED and the state machine has unexpectedly
changed its state.

I wonder whether we should be tracking the phy_start/stop state
separately, since we've had issues with phy_stop() warning when an
error has occurred (see commit 59088b5a946e).

Maybe something like this (untested)?

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index df54c137c5f5..d57f6de8a562 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -810,7 +810,8 @@ int phy_start_cable_test(struct phy_device *phydev,
 		goto out;
 	}
 
-	if (phydev->state < PHY_UP ||
+	if (phydev->oper_state != PHY_OPER_STARTED ||
+	    phydev->state < PHY_UP ||
 	    phydev->state > PHY_CABLETEST) {
 		NL_SET_ERR_MSG(extack,
 			       "PHY not configured. Try setting interface up");
@@ -881,7 +882,8 @@ int phy_start_cable_test_tdr(struct phy_device *phydev,
 		goto out;
 	}
 
-	if (phydev->state < PHY_UP ||
+	if (phydev->oper_state != PHY_OPER_STARTED ||
+	    phydev->state < PHY_UP ||
 	    phydev->state > PHY_CABLETEST) {
 		NL_SET_ERR_MSG(extack,
 			       "PHY not configured. Try setting interface up");
@@ -1364,10 +1366,8 @@ void phy_stop(struct phy_device *phydev)
 	struct net_device *dev = phydev->attached_dev;
 	enum phy_state old_state;
 
-	if (!phy_is_started(phydev) && phydev->state != PHY_DOWN &&
-	    phydev->state != PHY_ERROR) {
-		WARN(1, "called from state %s\n",
-		     phy_state_to_str(phydev->state));
+	if (phydev->oper_state != PHY_OPER_STARTED) {
+		WARN(1, "called when not started\n");
 		return;
 	}
 
@@ -1382,6 +1382,7 @@ void phy_stop(struct phy_device *phydev)
 	if (phydev->sfp_bus)
 		sfp_upstream_stop(phydev->sfp_bus);
 
+	phydev->oper_state = PHY_OPER_STOPPED;
 	phydev->state = PHY_HALTED;
 	phy_process_state_change(phydev, old_state);
 
@@ -1411,9 +1412,8 @@ void phy_start(struct phy_device *phydev)
 {
 	mutex_lock(&phydev->lock);
 
-	if (phydev->state != PHY_READY && phydev->state != PHY_HALTED) {
-		WARN(1, "called from state %s\n",
-		     phy_state_to_str(phydev->state));
+	if (phydev->oper_state != PHY_OPER_STOPPED) {
+		WARN(1, "called when not stopped\n");
 		goto out;
 	}
 
@@ -1423,6 +1423,7 @@ void phy_start(struct phy_device *phydev)
 	/* if phy was suspended, bring the physical link up again */
 	__phy_resume(phydev);
 
+	phydev->oper_state = PHY_OPER_STARTED;
 	phydev->state = PHY_UP;
 
 	phy_start_machine(phydev);
@@ -1442,14 +1443,18 @@ void phy_state_machine(struct work_struct *work)
 			container_of(dwork, struct phy_device, state_queue);
 	struct net_device *dev = phydev->attached_dev;
 	bool needs_aneg = false, do_suspend = false;
-	enum phy_state old_state;
+	enum phy_state old_state, state;
 	const void *func = NULL;
 	bool finished = false;
 	int err = 0;
 
 	mutex_lock(&phydev->lock);
 
-	old_state = phydev->state;
+	state = old_state = phydev->state;
+
+	/* If the PHY is stopped, then force state to halted. */
+	if (phydev->oper_state == PHY_OPER_STOPPED)
+		state = PHY_HALTED;
 
 	switch (phydev->state) {
 	case PHY_DOWN:
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 5dcab361a220..b128d903adb3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -519,6 +519,11 @@ enum phy_state {
 	PHY_CABLETEST,
 };
 
+enum phy_oper_state {
+	PHY_OPER_STOPPED,
+	PHY_OPER_STARTED,
+};
+
 #define MDIO_MMD_NUM 32
 
 /**
@@ -670,6 +675,7 @@ struct phy_device {
 	int rate_matching;
 
 	enum phy_state state;
+	enum phy_oper_state oper_state;
 
 	u32 dev_flags;
 
@@ -1221,7 +1227,8 @@ int phy_speed_down_core(struct phy_device *phydev);
  */
 static inline bool phy_is_started(struct phy_device *phydev)
 {
-	return phydev->state >= PHY_UP;
+	return phydev->oper_state == PHY_OPER_STARTED &&
+	       phydev->state >= PHY_UP;
 }
 
 void phy_resolve_aneg_pause(struct phy_device *phydev);
Jijie Shao Sept. 5, 2023, 8:49 a.m. UTC | #7
on 2023/9/4 21:43, Andrew Lunn wrote:
> On Mon, Sep 04, 2023 at 05:50:32PM +0800, Jijie Shao wrote:
>> Hi all,
>> We encountered an issue when resetting our netdevice recently, it seems
>> related to this patch.
>>
>> During our process, we stop phy first and call phy_start() later.
>> phy_check_link_status returns error because it read mdio failed. The
>> reason why it happened is that the cmdq is unusable when we reset and we
>> can't access to mdio.
> At what point in the flow below do you apply the reset which stops
> access to the MDIO bus? Ideally you want to do phy_stop(), then apply
> the reset, get the hardware working again, and then do a phy_start().
>

When we do a phy_stop(), hardware might be error and we can't access to
mdio.And our process is read/write mdio failed first, then do phy_stop(),
reset hardware and call phy_start() finally.

We note there are several times lock during phy_state_machine(). The first
is to handle phydev state. It's noting that a competition of phydev lock
happend again if phy_check_link_status() returns an error. Why we don't
held lock until changing state to PHY_ERROR if phy_check_link_status()
returns an error?

Jijie Shao
Jijie Shao Sept. 5, 2023, 8:59 a.m. UTC | #8
on 2023/9/4 22:42, Russell King (Oracle) wrote:
> Are you suggesting that the sequence is:
>
> phy_stop();
> reset netdev
> phy_start();
>
> ?
>
> Is the reason for doing this because you've already detected an issue
> with the hardware, and you're trying to recover it - and before you've
> called phy_stop() the hardware is already dead?

In our case, hardware is already dead before we called phy_stop().

>
> If that is the case, I'm not really sure what you expect to happen
> here. You've identified a race where the state machine is running in

What we expect is that the phy state is PHY_HALTED after we called
phy_stop(). So that we can do a phy_start() successully after resetting
our netdev.

> unison with phy_stop(), but in this circumstance it is also possible
> that the state machine could complete executing and have called
> phy_error_precise() before phy_stop() has even been called. In that
> case, you'll still get a warning-splat on the console from
> phy_error_precise().
>
> The only difference is that phy_stop() won't warn.
>
> That all said, this is obviously buggy, because phy_stop() has set
> the phydev state to PHY_HALTED and the state machine has unexpectedly
> changed its state.
>
> I wonder whether we should be tracking the phy_start/stop state
> separately, since we've had issues with phy_stop() warning when an
> error has occurred (see commit 59088b5a946e).
>
> Maybe something like this (untested)?

Thanks for your patch and it works in our case. By the way, when would you
push this patch?


Jijie Shao
Andrew Lunn Sept. 5, 2023, 12:09 p.m. UTC | #9
> When we do a phy_stop(), hardware might be error and we can't access to
> mdio.And our process is read/write mdio failed first, then do phy_stop(),
> reset hardware and call phy_start() finally.

If the hardware/fimrware is already dead, you have to expect a stack
trace, because the once a second poll can happen, before you notice
the hardware/firmware is dead and call phy_stop().

You might want to also disconnect the PHY and reconnect it after the
reset.

But you should prioritise finding why your hardware/firmware dies,
that is the real problem, and we should not really be adding hacks to
the phylib core to work around broken hardware. Such hacks belong
since your driver, e.g. if the MDIO read/write fails, check if the
firmware still responds, and trigger your reset process immediately.

     Andrew
Russell King (Oracle) Sept. 5, 2023, 1:48 p.m. UTC | #10
On Tue, Sep 05, 2023 at 04:49:31PM +0800, Jijie Shao wrote:
> We note there are several times lock during phy_state_machine(). The first
> is to handle phydev state. It's noting that a competition of phydev lock
> happend again if phy_check_link_status() returns an error. Why we don't
> held lock until changing state to PHY_ERROR if phy_check_link_status()
> returns an error?

You are quite correct that isn't very good. We can easily get rid of
some of this mess, but I don't think all which leaves it still open to
the race you describe.

The problem is phy_suspend().

First, it calls phy_ethtool_get_wol() which takes the phydev lock. This
can be dealt with if we save the state at probe time, and then update
the state when phy_ethtool_set_wol() is called.

Second, phy_suspend() calls ->suspend without holding the phydev lock,
and holding the lock while calling that may not be safe. Having had a
brief look over the implementations (but not delving into any PTP
function they may call) does seem to suggest that shouldn't be a big
problem, but I don't know whether holding the phydev lock while calling
PTP functions is likely to cause issues.

However, looking at that has lead me to the conclusion that there is a
lot of duplication of WoL condition testing. phy_suspend() already
avoids calling ->suspend() if either phy_ethtool_get_wol() indicates
that WoL is enabled, or if the netdev says WoL is enabled.

Many of the ->suspend() implementations, for example,
lan88xx_suspend(), dp83822_suspend(), etc test some kind of flag to
determine whether WoL is enabled and thus seem to be re-implementing
what phy_suspend() is already doing. I think there is scope to delete
this code from several drivers.

The easy bits are the patches I've attached to this email. These
won't on their own be sufficient to close the race you've identified
due to the phy_suspend() issue, but are the best we can do until we
can work out what to do about that.
Russell King (Oracle) Sept. 5, 2023, 2 p.m. UTC | #11
On Tue, Sep 05, 2023 at 02:09:29PM +0200, Andrew Lunn wrote:
> > When we do a phy_stop(), hardware might be error and we can't access to
> > mdio.And our process is read/write mdio failed first, then do phy_stop(),
> > reset hardware and call phy_start() finally.
> 
> If the hardware/fimrware is already dead, you have to expect a stack
> trace, because the once a second poll can happen, before you notice
> the hardware/firmware is dead and call phy_stop().
> 
> You might want to also disconnect the PHY and reconnect it after the
> reset.

Andrew,

I think that's what is being tried here, but there's a race between
phy_stop() and phy_state_machine() which is screwing up phydev->state.

Honestly, the locking in phy_state_machine() is insane, allows this
race to happen, and really needs fixing... and I think that the
phydev->lock usage has become really insane over the years. We have
some driver methods now which are always called with the lock held,
others where the lock may or may not be held, and others where the
lock isn't held - and none of this is documented.

Please can you have a look at the four patches I've just posted as
attached to my previous email. I think we need to start sorting out
some of this crazyness and santising the locking.

My four patches address most of it, except the call to phy_suspend().
If that can be solved, then we can improve the locking more, and
eliminate the race entirely.

If we held the lock over the entire state machine function, then the
problem that has been reported here would not exist - phy_stop()
would not be able to "nip in" during the middle of the PHY state
machine running, and thus we would not see the PHY_HALTED state
overwritten by PHY_ERROR unexpectedly.
Russell King (Oracle) Sept. 5, 2023, 3:24 p.m. UTC | #12
On Tue, Sep 05, 2023 at 02:48:13PM +0100, Russell King (Oracle) wrote:
> On Tue, Sep 05, 2023 at 04:49:31PM +0800, Jijie Shao wrote:
> > We note there are several times lock during phy_state_machine(). The first
> > is to handle phydev state. It's noting that a competition of phydev lock
> > happend again if phy_check_link_status() returns an error. Why we don't
> > held lock until changing state to PHY_ERROR if phy_check_link_status()
> > returns an error?
> 
> You are quite correct that isn't very good. We can easily get rid of
> some of this mess, but I don't think all which leaves it still open to
> the race you describe.
> 
> The problem is phy_suspend().
> 
> First, it calls phy_ethtool_get_wol() which takes the phydev lock. This
> can be dealt with if we save the state at probe time, and then update
> the state when phy_ethtool_set_wol() is called.
> 
> Second, phy_suspend() calls ->suspend without holding the phydev lock,
> and holding the lock while calling that may not be safe. Having had a
> brief look over the implementations (but not delving into any PTP
> function they may call) does seem to suggest that shouldn't be a big
> problem, but I don't know whether holding the phydev lock while calling
> PTP functions is likely to cause issues.
> 
> However, looking at that has lead me to the conclusion that there is a
> lot of duplication of WoL condition testing. phy_suspend() already
> avoids calling ->suspend() if either phy_ethtool_get_wol() indicates
> that WoL is enabled, or if the netdev says WoL is enabled.
> 
> Many of the ->suspend() implementations, for example,
> lan88xx_suspend(), dp83822_suspend(), etc test some kind of flag to
> determine whether WoL is enabled and thus seem to be re-implementing
> what phy_suspend() is already doing. I think there is scope to delete
> this code from several drivers.
> 
> The easy bits are the patches I've attached to this email. These
> won't on their own be sufficient to close the race you've identified
> due to the phy_suspend() issue, but are the best we can do until we
> can work out what to do about that.

Having looked deeper at this, I think there may be a solution. See
these follow-on patches.
Andrew Lunn Sept. 6, 2023, 12:59 p.m. UTC | #13
> Having looked deeper at this, I think there may be a solution. See
> these follow-on patches.

Hi Russell

I'm on vacation at the moment with limited time and network access.

> Move the call to phy_suspend() to the end of phy_state_machine() after
> we release the lock.

I know this is a quick RFC exploration of the problem space, but it
would be good to comment about 'Why?'. Suspend and resume has had
deadlock issues in the past, which is why they don't take the lock.

> Split out the locked and unlocked sections of phy_state_machine() into
> two separate functions which can be called inside the phydev lock and
> outside the phydev lock as appropriate.

Again, i think some mention of suspend/resume would be good, since
that is what is causing these issues. Maybe we also need to add a
comment next to struct phy_device lock about what the lock should be
protecting.

	Andrew
diff mbox series

Patch

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 0c0df38cd1ab..bdf00b2b2c1d 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -52,6 +52,7 @@  static const char *phy_state_to_str(enum phy_state st)
 	PHY_STATE_STR(NOLINK)
 	PHY_STATE_STR(CABLETEST)
 	PHY_STATE_STR(HALTED)
+	PHY_STATE_STR(ERROR)
 	}
 
 	return NULL;
@@ -1184,7 +1185,7 @@  void phy_stop_machine(struct phy_device *phydev)
 static void phy_process_error(struct phy_device *phydev)
 {
 	mutex_lock(&phydev->lock);
-	phydev->state = PHY_HALTED;
+	phydev->state = PHY_ERROR;
 	mutex_unlock(&phydev->lock);
 
 	phy_trigger_machine(phydev);
@@ -1198,10 +1199,10 @@  static void phy_error_precise(struct phy_device *phydev,
 }
 
 /**
- * phy_error - enter HALTED state for this PHY device
+ * phy_error - enter ERROR state for this PHY device
  * @phydev: target phy_device struct
  *
- * Moves the PHY to the HALTED state in response to a read
+ * Moves the PHY to the ERROR state in response to a read
  * or write error, and tells the controller the link is down.
  * Must not be called from interrupt context, or while the
  * phydev->lock is held.
@@ -1326,7 +1327,8 @@  void phy_stop(struct phy_device *phydev)
 	struct net_device *dev = phydev->attached_dev;
 	enum phy_state old_state;
 
-	if (!phy_is_started(phydev) && phydev->state != PHY_DOWN) {
+	if (!phy_is_started(phydev) && phydev->state != PHY_DOWN &&
+	    phydev->state != PHY_ERROR) {
 		WARN(1, "called from state %s\n",
 		     phy_state_to_str(phydev->state));
 		return;
@@ -1443,6 +1445,7 @@  void phy_state_machine(struct work_struct *work)
 		}
 		break;
 	case PHY_HALTED:
+	case PHY_ERROR:
 		if (phydev->link) {
 			phydev->link = 0;
 			phy_link_down(phydev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 2da87a36200d..7addde5d14c0 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -497,14 +497,17 @@  struct phy_device *mdiobus_scan_c22(struct mii_bus *bus, int addr);
  * Once complete, move to UP to restart the PHY.
  * - phy_stop aborts the running test and moves to @PHY_HALTED
  *
- * @PHY_HALTED: PHY is up, but no polling or interrupts are done. Or
- * PHY is in an error state.
+ * @PHY_HALTED: PHY is up, but no polling or interrupts are done.
  * - phy_start moves to @PHY_UP
+ *
+ * @PHY_ERROR: PHY is up, but is in an error state.
+ * - phy_stop moves to @PHY_HALTED
  */
 enum phy_state {
 	PHY_DOWN = 0,
 	PHY_READY,
 	PHY_HALTED,
+	PHY_ERROR,
 	PHY_UP,
 	PHY_RUNNING,
 	PHY_NOLINK,