Message ID | 20230712223405.861899-1-linus.walleij@linaro.org (mailing list archive) |
---|---|
State | Accepted |
Commit | 95ce158b6c93b28842b54b42ad1cb221b9844062 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] dsa: mv88e6xxx: Do a final check before timing out | expand |
On Thu, Jul 13, 2023 at 12:34:05AM +0200, Linus Walleij wrote: > I get sporadic timeouts from the driver when using the > MV88E6352. Reading the status again after the loop fixes the > problem: the operation is successful but goes undetected. > > Some added prints show things like this: > > [ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting > for switch, addr 1b reg 0b, mask 8000, val 0000, data c000 > [ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for > ATU op 4000, fid 0001 > (...) > [ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting > for switch, addr 1c reg 18, mask 8000, val 0000, data 9860 > [ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting > for PHY command 1860 to complete > > The reason is probably not the commands: I think those are > mostly fine with the 50+50ms timeout, but the problem > appears when OpenWrt brings up several interfaces in > parallel on a system with 7 populated ports: if one of > them take more than 50 ms and waits one or more of the > others can get stuck on the mutex for the switch and then > this can easily multiply. > > As we sleep and wait, the function loop needs a final > check after exiting the loop if we were successful. > > Suggested-by: Andrew Lunn <andrew@lunn.ch> > Cc: Tobias Waldekranz <tobias@waldekranz.com> > Fixes: 35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling") > Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Hi Linus Thanks for the new version. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Andrew
Hello: This patch was applied to netdev/net.git (main) by Jakub Kicinski <kuba@kernel.org>: On Thu, 13 Jul 2023 00:34:05 +0200 you wrote: > I get sporadic timeouts from the driver when using the > MV88E6352. Reading the status again after the loop fixes the > problem: the operation is successful but goes undetected. > > Some added prints show things like this: > > [ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting > for switch, addr 1b reg 0b, mask 8000, val 0000, data c000 > [ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for > ATU op 4000, fid 0001 > (...) > [ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting > for switch, addr 1c reg 18, mask 8000, val 0000, data 9860 > [ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting > for PHY command 1860 to complete > > [...] Here is the summary with links: - [net,v2] dsa: mv88e6xxx: Do a final check before timing out https://git.kernel.org/netdev/net/c/95ce158b6c93 You are awesome, thank you!
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 08a46ffd53af..642e93e8623e 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -109,6 +109,13 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg, usleep_range(1000, 2000); } + err = mv88e6xxx_read(chip, addr, reg, &data); + if (err) + return err; + + if ((data & mask) == val) + return 0; + dev_err(chip->dev, "Timeout while waiting for switch\n"); return -ETIMEDOUT; }
I get sporadic timeouts from the driver when using the MV88E6352. Reading the status again after the loop fixes the problem: the operation is successful but goes undetected. Some added prints show things like this: [ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting for switch, addr 1b reg 0b, mask 8000, val 0000, data c000 [ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for ATU op 4000, fid 0001 (...) [ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting for switch, addr 1c reg 18, mask 8000, val 0000, data 9860 [ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting for PHY command 1860 to complete The reason is probably not the commands: I think those are mostly fine with the 50+50ms timeout, but the problem appears when OpenWrt brings up several interfaces in parallel on a system with 7 populated ports: if one of them take more than 50 ms and waits one or more of the others can get stuck on the mutex for the switch and then this can easily multiply. As we sleep and wait, the function loop needs a final check after exiting the loop if we were successful. Suggested-by: Andrew Lunn <andrew@lunn.ch> Cc: Tobias Waldekranz <tobias@waldekranz.com> Fixes: 35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- ChangeLog v1->v2: - Instead of reading 10 times, read an extra time after the loop and check if the value is fine. --- drivers/net/dsa/mv88e6xxx/chip.c | 7 +++++++ 1 file changed, 7 insertions(+)