Message ID | 20230708212030.528783-1-linus.walleij@linaro.org (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] dsa: mv88e6xxx: Do a few more tries before timing out | expand |
On Sat, Jul 08, 2023 at 11:20:30PM +0200, Linus Walleij wrote: > I get sporadic timeouts from the driver when using the > MV88E6352. Increasing the timeout rounds solves the problem. > Some added prints show things like this: > > [ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting > for switch, addr 1b reg 0b, mask 8000, val 0000, data c000 > [ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for > ATU op 4000, fid 0001 > (...) > [ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting > for switch, addr 1c reg 18, mask 8000, val 0000, data 9860 > [ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting > for PHY command 1860 to complete > > The reason is probably not the commands: I think those are > mostly fine with the 50+50ms timeout, but the problem > appears when OpenWrt brings up several interfaces in > parallel on a system with 7 populated ports: if one of > them take more than 50 ms and waits one or more of the > others can get stuck on the mutex for the switch and then > this can easily multiply. This is one of the classic bugs i keep an eye out for, and point developers to iopoll.h to avoid it. As you say, sleep() or a mutex can take a lot longer than expected, so the loop exits with ETIMEDOUT, but in fact the operation is successful, but not noticed. The correct fix for this is after the loop, there should be one more read of the register and a test on the condition. Only if that fails then return -ETIMEDOUT. Andrew
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index 08a46ffd53af..63ee1545b79e 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -92,10 +92,10 @@ int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg, int i; /* There's no bus specific operation to wait for a mask. Even - * if the initial poll takes longer than 50ms, always do at - * least one more attempt. + * if the initial poll takes longer than 50ms, always do a few + * attempts. */ - for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) { + for (i = 0; time_before(jiffies, timeout) || (i < 10); i++) { err = mv88e6xxx_read(chip, addr, reg, &data); if (err) return err;
I get sporadic timeouts from the driver when using the MV88E6352. Increasing the timeout rounds solves the problem. Some added prints show things like this: [ 58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting for switch, addr 1b reg 0b, mask 8000, val 0000, data c000 [ 58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for ATU op 4000, fid 0001 (...) [ 61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting for switch, addr 1c reg 18, mask 8000, val 0000, data 9860 [ 61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting for PHY command 1860 to complete The reason is probably not the commands: I think those are mostly fine with the 50+50ms timeout, but the problem appears when OpenWrt brings up several interfaces in parallel on a system with 7 populated ports: if one of them take more than 50 ms and waits one or more of the others can get stuck on the mutex for the switch and then this can easily multiply. Cc: Tobias Waldekranz <tobias@waldekranz.com> Fixes: 35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling") Signed-off-by: Linus Walleij <linus.walleij@linaro.org> --- drivers/net/dsa/mv88e6xxx/chip.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)