mbox series

[net-next,0/7] net: phy: avoid race when erroring stopping PHY

Message ID ZQMn+Wkvod10vdLd@shell.armlinux.org.uk (mailing list archive)
Headers show
Series net: phy: avoid race when erroring stopping PHY | expand

Message

Russell King (Oracle) Sept. 14, 2023, 3:34 p.m. UTC
This series addresses a problem reported by Jijie Shao where the PHY
state machine can race with phy_stop() leading to an incorrect state.

The issue centres around phy_state_machine() dropping the phydev->lock
mutex briefly, which allows phy_stop() to get in half-way through the
state machine, and when the state machine resumes, it overwrites
phydev->state with a value incompatible with a stopped PHY. This causes
a subsequent phy_start() to issue a warning.

We address this firstly by using versions of functions that do not take
tne lock, moving them into the locked region. The only function that
this can't be done with is phy_suspend() which needs to call into the
driver without taking the lock.

For phy_suspend(), we split the state machine into two parts - the
initial part which runs under the phydev->lock, and the second part
which runs without the lock.

We finish off by using the split state machine in phy_stop() which
removes another unnecessary unlock-lock sequence from phylib.

Changes from RFC:
- Added Jijie Shao's tested-by

 drivers/net/phy/phy.c | 204 +++++++++++++++++++++++++++-----------------------
 1 file changed, 110 insertions(+), 94 deletions(-)

Comments

patchwork-bot+netdevbpf@kernel.org Sept. 17, 2023, 1:40 p.m. UTC | #1
Hello:

This series was applied to netdev/net-next.git (main)
by David S. Miller <davem@davemloft.net>:

On Thu, 14 Sep 2023 16:34:17 +0100 you wrote:
> This series addresses a problem reported by Jijie Shao where the PHY
> state machine can race with phy_stop() leading to an incorrect state.
> 
> The issue centres around phy_state_machine() dropping the phydev->lock
> mutex briefly, which allows phy_stop() to get in half-way through the
> state machine, and when the state machine resumes, it overwrites
> phydev->state with a value incompatible with a stopped PHY. This causes
> a subsequent phy_start() to issue a warning.
> 
> [...]

Here is the summary with links:
  - [net-next,1/7] net: phy: always call phy_process_state_change() under lock
    https://git.kernel.org/netdev/net-next/c/8da77df649c4
  - [net-next,2/7] net: phy: call phy_error_precise() while holding the lock
    https://git.kernel.org/netdev/net-next/c/ef113a60d0a9
  - [net-next,3/7] net: phy: move call to start aneg
    https://git.kernel.org/netdev/net-next/c/ea5968cd7d6e
  - [net-next,4/7] net: phy: move phy_suspend() to end of phy_state_machine()
    https://git.kernel.org/netdev/net-next/c/6e19b3502c59
  - [net-next,5/7] net: phy: move phy_state_machine()
    https://git.kernel.org/netdev/net-next/c/c398ef41b6d4
  - [net-next,6/7] net: phy: split locked and unlocked section of phy_state_machine()
    https://git.kernel.org/netdev/net-next/c/8635c0663e6b
  - [net-next,7/7] net: phy: convert phy_stop() to use split state machine
    https://git.kernel.org/netdev/net-next/c/adcbb85508c8

You are awesome, thank you!