diff mbox series

[net] dsa: mv88e6xxx: Do a few more tries before timing out

Message ID 20230708212030.528783-1-linus.walleij@linaro.org (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] dsa: mv88e6xxx: Do a few more tries before timing out | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1341 this patch: 1341
netdev/cc_maintainers warning 2 maintainers not CCed: pabeni@redhat.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 1364 this patch: 1364
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1364 this patch: 1364
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 13 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Linus Walleij July 8, 2023, 9:20 p.m. UTC
I get sporadic timeouts from the driver when using the
MV88E6352. Increasing the timeout rounds solves the problem.
Some added prints show things like this:

[   58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
    for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
[   58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
    ATU op 4000, fid 0001
(...)
[   61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
    for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
[   61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
    for PHY command 1860 to complete

The reason is probably not the commands: I think those are
mostly fine with the 50+50ms timeout, but the problem
appears when OpenWrt brings up several interfaces in
parallel on a system with 7 populated ports: if one of
them take more than 50 ms and waits one or more of the
others can get stuck on the mutex for the switch and then
this can easily multiply.

Cc: Tobias Waldekranz <tobias@waldekranz.com>
Fixes: 35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Andrew Lunn July 8, 2023, 9:31 p.m. UTC | #1
On Sat, Jul 08, 2023 at 11:20:30PM +0200, Linus Walleij wrote:
> I get sporadic timeouts from the driver when using the
> MV88E6352. Increasing the timeout rounds solves the problem.
> Some added prints show things like this:
> 
> [   58.356209] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
>     for switch, addr 1b reg 0b, mask 8000, val 0000, data c000
> [   58.367487] mv88e6085 mdio_mux-0.1:00: Timeout waiting for
>     ATU op 4000, fid 0001
> (...)
> [   61.826293] mv88e6085 mdio_mux-0.1:00: Timeout while waiting
>     for switch, addr 1c reg 18, mask 8000, val 0000, data 9860
> [   61.837560] mv88e6085 mdio_mux-0.1:00: Timeout waiting
>     for PHY command 1860 to complete
> 
> The reason is probably not the commands: I think those are
> mostly fine with the 50+50ms timeout, but the problem
> appears when OpenWrt brings up several interfaces in
> parallel on a system with 7 populated ports: if one of
> them take more than 50 ms and waits one or more of the
> others can get stuck on the mutex for the switch and then
> this can easily multiply.

This is one of the classic bugs i keep an eye out for, and point
developers to iopoll.h to avoid it.

As you say, sleep() or a mutex can take a lot longer than expected, so
the loop exits with ETIMEDOUT, but in fact the operation is
successful, but not noticed.

The correct fix for this is after the loop, there should be one more
read of the register and a test on the condition. Only if that fails
then return -ETIMEDOUT.

	Andrew
diff mbox series

Patch

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 08a46ffd53af..63ee1545b79e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -92,10 +92,10 @@  int mv88e6xxx_wait_mask(struct mv88e6xxx_chip *chip, int addr, int reg,
 	int i;
 
 	/* There's no bus specific operation to wait for a mask. Even
-	 * if the initial poll takes longer than 50ms, always do at
-	 * least one more attempt.
+	 * if the initial poll takes longer than 50ms, always do a few
+	 * attempts.
 	 */
-	for (i = 0; time_before(jiffies, timeout) || (i < 2); i++) {
+	for (i = 0; time_before(jiffies, timeout) || (i < 10); i++) {
 		err = mv88e6xxx_read(chip, addr, reg, &data);
 		if (err)
 			return err;