Message ID | 20221112203748.68995-1-netdev@kapio-technology.com (mailing list archive) |
---|---|
Headers | show |
Series | mv88e6xxx: Add MAB offload support | expand |
On Sat, 12 Nov 2022 21:37:46 +0100 Hans J. Schultz wrote: > This patchset adds MAB [1] offload support in mv88e6xxx. > > Patch #1: Fix a problem when reading the FID needed to get the VID. > > Patch #2: The MAB implementation for mv88e6xxx. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a35ec8e38cdd1766f29924ca391a01de20163931 Vladimir, Ido, ack?
On Mon, 14 Nov 2022 18:57:04 -0800 Jakub Kicinski wrote: > On Sat, 12 Nov 2022 21:37:46 +0100 Hans J. Schultz wrote: > > This patchset adds MAB [1] offload support in mv88e6xxx. > > > > Patch #1: Fix a problem when reading the FID needed to get the VID. > > > > Patch #2: The MAB implementation for mv88e6xxx. > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a35ec8e38cdd1766f29924ca391a01de20163931 > > Vladimir, Ido, ack? Ah, either way a v9 will be needed: drivers/net/dsa/mv88e6xxx/switchdev.c:33:5: warning: no previous prototype for function 'mv88e6xxx_handle_violation' [-Wmissing-prototypes] int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip, int port, ^ drivers/net/dsa/mv88e6xxx/switchdev.c:33:1: note: declare 'static' if the function is not intended to be used outside of this translation unit int mv88e6xxx_handle_violation(struct mv88e6xxx_chip *chip, int port, ^
On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote: > This patchset adds MAB [1] offload support in mv88e6xxx. > > Patch #1: Fix a problem when reading the FID needed to get the VID. > > Patch #2: The MAB implementation for mv88e6xxx. Just to be sure, this was tested with bridge_locked_port.sh, right?
On 2022-11-15 10:30, Ido Schimmel wrote: > On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote: >> This patchset adds MAB [1] offload support in mv88e6xxx. >> >> Patch #1: Fix a problem when reading the FID needed to get the VID. >> >> Patch #2: The MAB implementation for mv88e6xxx. > > Just to be sure, this was tested with bridge_locked_port.sh, right? As I have the phy regression I have given notice of, that has simply not been possible. After maybe 50 resets it worked for me at a point (something to do with timing), and I tested it manually. When I have tried to run the selftests, I get errors related to the phy problem, which I have not been able to find a way around.
On Tue, Nov 15, 2022 at 11:26:55AM +0100, netdev@kapio-technology.com wrote: > On 2022-11-15 10:30, Ido Schimmel wrote: > > On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote: > > > This patchset adds MAB [1] offload support in mv88e6xxx. > > > > > > Patch #1: Fix a problem when reading the FID needed to get the VID. > > > > > > Patch #2: The MAB implementation for mv88e6xxx. > > > > Just to be sure, this was tested with bridge_locked_port.sh, right? > > As I have the phy regression I have given notice of, that has simply not > been possible. After maybe 50 resets it worked for me at a point > (something to do with timing), and I tested it manually. > > When I have tried to run the selftests, I get errors related to the phy > problem, which I have not been able to find a way around. What PHY regression?
On 2022-11-15 11:28, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 11:26:55AM +0100, netdev@kapio-technology.com > wrote: >> On 2022-11-15 10:30, Ido Schimmel wrote: >> > On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote: >> > > This patchset adds MAB [1] offload support in mv88e6xxx. >> > > >> > > Patch #1: Fix a problem when reading the FID needed to get the VID. >> > > >> > > Patch #2: The MAB implementation for mv88e6xxx. >> > >> > Just to be sure, this was tested with bridge_locked_port.sh, right? >> >> As I have the phy regression I have given notice of, that has simply >> not >> been possible. After maybe 50 resets it worked for me at a point >> (something to do with timing), and I tested it manually. >> >> When I have tried to run the selftests, I get errors related to the >> phy >> problem, which I have not been able to find a way around. > > What PHY regression? I had a discussion with Jacub, which resulted in me sending a mail to maintainers on this. The problem is shown below... the phy is marvell/6097/88e3082 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 332 at drivers/net/phy/phy.c:975 phy_error+0x28/0x54 Modules linked in: CPU: 0 PID: 332 Comm: kworker/0:0 Tainted: G W 6.0.0 #17 Hardware name: Freescale i.MX27 (Device Tree Support) Workqueue: events_power_efficient phy_state_machine unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x28/0x30 dump_stack_lvl from __warn+0xb8/0x114 __warn from warn_slowpath_fmt+0x80/0xbc warn_slowpath_fmt from phy_error+0x28/0x54 phy_error from phy_state_machine+0xbc/0x218 phy_state_machine from process_one_work+0x17c/0x244 process_one_work from worker_thread+0x248/0x2cc worker_thread from kthread+0xb0/0xbc kthread from ret_from_fork+0x14/0x2c Exception stack(0xc4a71fb0 to 0xc4a71ff8) 1fa0: 00000000 00000000 00000000 00000000 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 ---[ end trace 0000000000000000 ]---
On Tue, Nov 15, 2022 at 11:52:40AM +0100, netdev@kapio-technology.com wrote: > I had a discussion with Jacub, which resulted in me sending a mail to > maintainers on this. The problem is shown below... > > the phy is marvell/6097/88e3082 > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 332 at drivers/net/phy/phy.c:975 > phy_error+0x28/0x54 > Modules linked in: > CPU: 0 PID: 332 Comm: kworker/0:0 Tainted: G W 6.0.0 #17 > Hardware name: Freescale i.MX27 (Device Tree Support) > Workqueue: events_power_efficient phy_state_machine > unwind_backtrace from show_stack+0x18/0x1c > show_stack from dump_stack_lvl+0x28/0x30 > dump_stack_lvl from __warn+0xb8/0x114 > __warn from warn_slowpath_fmt+0x80/0xbc > warn_slowpath_fmt from phy_error+0x28/0x54 > phy_error from phy_state_machine+0xbc/0x218 > phy_state_machine from process_one_work+0x17c/0x244 > process_one_work from worker_thread+0x248/0x2cc > worker_thread from kthread+0xb0/0xbc > kthread from ret_from_fork+0x14/0x2c > Exception stack(0xc4a71fb0 to 0xc4a71ff8) > 1fa0: 00000000 00000000 00000000 00000000 > 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > ---[ end trace 0000000000000000 ]--- Was that email public on the lists? I don't see it... The phy_error() is called from phy_state_machine() if one of phy_check_link_status() or phy_start_aneg() fails. Could you please print exactly the value of "err", as well as dig deeper to see which call is failing, all the way into the PHY driver? Easiest way to do that would probably be something like: $ trace-cmd record -e mdio sleep 10 & ... do your stuff ... $ trace-cmd report kworker/u4:3-337 [001] 59.054741: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x01 val:0x7949 kworker/u4:3-337 [001] 59.054941: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x09 val:0x0700 kworker/u4:3-337 [001] 59.055262: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x0a val:0x4000 kworker/u4:3-337 [001] 60.075808: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x1c val:0x3005 "val" will be negative when there is an error. This is to see quicker what fails, and if some MDIO access ever works. If you don't want to enable CONFIG_FTRACE or install trace-cmd, you could also probably do the debugging manually.
On 2022-11-15 12:10, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 11:52:40AM +0100, netdev@kapio-technology.com > wrote: >> I had a discussion with Jacub, which resulted in me sending a mail to >> maintainers on this. The problem is shown below... >> >> the phy is marvell/6097/88e3082 >> >> ------------[ cut here ]------------ >> WARNING: CPU: 0 PID: 332 at drivers/net/phy/phy.c:975 >> phy_error+0x28/0x54 >> Modules linked in: >> CPU: 0 PID: 332 Comm: kworker/0:0 Tainted: G W 6.0.0 >> #17 >> Hardware name: Freescale i.MX27 (Device Tree Support) >> Workqueue: events_power_efficient phy_state_machine >> unwind_backtrace from show_stack+0x18/0x1c >> show_stack from dump_stack_lvl+0x28/0x30 >> dump_stack_lvl from __warn+0xb8/0x114 >> __warn from warn_slowpath_fmt+0x80/0xbc >> warn_slowpath_fmt from phy_error+0x28/0x54 >> phy_error from phy_state_machine+0xbc/0x218 >> phy_state_machine from process_one_work+0x17c/0x244 >> process_one_work from worker_thread+0x248/0x2cc >> worker_thread from kthread+0xb0/0xbc >> kthread from ret_from_fork+0x14/0x2c >> Exception stack(0xc4a71fb0 to 0xc4a71ff8) >> 1fa0: 00000000 00000000 00000000 >> 00000000 >> 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >> 00000000 >> 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 >> ---[ end trace 0000000000000000 ]--- > > Was that email public on the lists? I don't see it... Sorry, yes the public list was not added. > > The phy_error() is called from phy_state_machine() if one of > phy_check_link_status() or phy_start_aneg() fails. > > Could you please print exactly the value of "err", as well as dig > deeper > to see which call is failing, all the way into the PHY driver? It happens on upstart, so I would then have to hack the system upstart to add trace. I also have: mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell 88E6097/88E6097F, revision 2 mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link mode mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow control off mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver [Generic PHY] (irq=POLL) mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver [Marvell 88E1112] (irq=174) mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver [Marvell 88E1112] (irq=175) after this and adding the ifaces to the bridge, it continues like: br0: port 1(eth10) entered blocking state br0: port 1(eth10) entered disabled state br0: port 2(eth6) entered blocking state br0: port 2(eth6) entered disabled state device eth6 entered promiscuous mode device eth10 entered promiscuous mode br0: port 3(eth9) entered blocking state br0: port 3(eth9) entered disabled state device eth9 entered promiscuous mode br0: port 4(eth5) entered blocking state br0: port 4(eth5) entered disabled state device eth5 entered promiscuous mode br0: port 5(eth8) entered blocking state br0: port 5(eth8) entered disabled state device eth8 entered promiscuous mode br0: port 6(eth4) entered blocking state br0: port 6(eth4) entered disabled state mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch mv88e6085 1002b000.ethernet-1:04: port 0 failed to add 9a:af:03:f1:bd:0a vid 1 to fdb: -110 device eth4 entered promiscuous mode br0: port 7(eth7) entered blocking state br0: port 7(eth7) entered disabled state I don't know if that gives ay clues...? Otherwise I have to take more time to see what I can dig out. The easiest for me is then to add some printk statements giving targeted information if told what and where... > > Easiest way to do that would probably be something like: > > $ trace-cmd record -e mdio sleep 10 & > ... do your stuff ... > $ trace-cmd report > kworker/u4:3-337 [001] 59.054741: mdio_access: > 0000:00:00.3 read phy:0x13 reg:0x01 val:0x7949 > kworker/u4:3-337 [001] 59.054941: mdio_access: > 0000:00:00.3 read phy:0x13 reg:0x09 val:0x0700 > kworker/u4:3-337 [001] 59.055262: mdio_access: > 0000:00:00.3 read phy:0x13 reg:0x0a val:0x4000 > kworker/u4:3-337 [001] 60.075808: mdio_access: > 0000:00:00.3 read phy:0x13 reg:0x1c val:0x3005 > > "val" will be negative when there is an error. This is to see quicker > what fails, > and if some MDIO access ever works. > > If you don't want to enable CONFIG_FTRACE or install trace-cmd, you > could also probably do the debugging manually.
On Tue, Nov 15, 2022 at 12:31:59PM +0100, netdev@kapio-technology.com wrote: > It happens on upstart, so I would then have to hack the system upstart to > add trace. Hack upstart or disable the service that brings the switch ports up, and bring them up manually... > I also have: > mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell 88E6097/88E6097F, revision 2 > mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link mode > mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow control off > mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver [Generic PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver [Marvell 88E1112] (irq=174) > mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver [Marvell 88E1112] (irq=175) > > after this and adding the ifaces to the bridge, it continues like: > > br0: port 1(eth10) entered blocking state > br0: port 1(eth10) entered disabled state > br0: port 2(eth6) entered blocking state > br0: port 2(eth6) entered disabled state > device eth6 entered promiscuous mode > device eth10 entered promiscuous mode > br0: port 3(eth9) entered blocking state > br0: port 3(eth9) entered disabled state > device eth9 entered promiscuous mode > br0: port 4(eth5) entered blocking state > br0: port 4(eth5) entered disabled state > device eth5 entered promiscuous mode > br0: port 5(eth8) entered blocking state > br0: port 5(eth8) entered disabled state > device eth8 entered promiscuous mode > br0: port 6(eth4) entered blocking state > br0: port 6(eth4) entered disabled state > mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch > mv88e6085 1002b000.ethernet-1:04: port 0 failed to add 9a:af:03:f1:bd:0a vid 1 to fdb: -110 Dumb question, but if you get errors like this, how can you test anything at all in the patches that you submit? > device eth4 entered promiscuous mode > br0: port 7(eth7) entered blocking state > br0: port 7(eth7) entered disabled state > > I don't know if that gives ay clues...? Not really. That error might be related - something indicating a breakage in the top-level (fec IIUC) MDIO controller, or not. There was "recent" rework almost everywhere. For example commit 35da1dfd9484 ("net: dsa: mv88e6xxx: Improve performance of busy bit polling"). That also hooks into the mv88e6xxx cascaded MDIO controller (mv88e6xxx_g2_smi_phy_wait), so there might be something there. > > Otherwise I have to take more time to see what I can dig out. The easiest > for me is then to add some printk statements giving targeted information if told what and > where... Do you have a timeline for when the regression was introduced? Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with that reverted might be worth a shot. Otherwise, a bisect from a known working version only takes a couple of hours, and shouldn't require other changes to the setup.
On 2022-11-15 13:22, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 12:31:59PM +0100, netdev@kapio-technology.com > wrote: >> It happens on upstart, so I would then have to hack the system upstart >> to >> add trace. > > Hack upstart or disable the service that brings the switch ports up, > and > bring them up manually... > >> I also have: >> mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell >> 88E6097/88E6097F, revision 2 >> mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link >> mode >> mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow >> control off >> mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver >> [Generic PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver >> [Marvell 88E1112] (irq=174) >> mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver >> [Marvell 88E1112] (irq=175) >> >> after this and adding the ifaces to the bridge, it continues like: >> >> br0: port 1(eth10) entered blocking state >> br0: port 1(eth10) entered disabled state >> br0: port 2(eth6) entered blocking state >> br0: port 2(eth6) entered disabled state >> device eth6 entered promiscuous mode >> device eth10 entered promiscuous mode >> br0: port 3(eth9) entered blocking state >> br0: port 3(eth9) entered disabled state >> device eth9 entered promiscuous mode >> br0: port 4(eth5) entered blocking state >> br0: port 4(eth5) entered disabled state >> device eth5 entered promiscuous mode >> br0: port 5(eth8) entered blocking state >> br0: port 5(eth8) entered disabled state >> device eth8 entered promiscuous mode >> br0: port 6(eth4) entered blocking state >> br0: port 6(eth4) entered disabled state >> mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch >> mv88e6085 1002b000.ethernet-1:04: port 0 failed to add >> 9a:af:03:f1:bd:0a vid 1 to fdb: -110 > > Dumb question, but if you get errors like this, how can you test > anything at all > in the patches that you submit? The answer is that I don't always get these errors... once in a while (maaany resets) it does not happen, and all is fine. The error code is... well of course -110 (timed out). > >> device eth4 entered promiscuous mode >> br0: port 7(eth7) entered blocking state >> br0: port 7(eth7) entered disabled state >> >> I don't know if that gives ay clues...? > > Not really. That error might be related - something indicating a > breakage > in the top-level (fec IIUC) MDIO controller, or not. There was "recent" > rework almost everywhere. For example commit 35da1dfd9484 ("net: dsa: > mv88e6xxx: Improve performance of busy bit polling"). That also hooks > into the mv88e6xxx cascaded MDIO controller > (mv88e6xxx_g2_smi_phy_wait), > so there might be something there. > I can check that out, but I remember that net-next has not worked on this device for quite some time... >> >> Otherwise I have to take more time to see what I can dig out. The >> easiest >> for me is then to add some printk statements giving targeted >> information if told what and >> where... > > Do you have a timeline for when the regression was introduced? > Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with > that reverted might be worth a shot. Otherwise, a bisect from a known > working version only takes a couple of hours, and shouldn't require > other changes to the setup. I can't say when the regression was introduced as I used modified kernels, but something between 5.16 and 5.17, I know there was something phy related, but it's a bit more complicated, so it is only a guess... I would have to get the whole locked port patch set etc. on a 5.16 to see if that works.
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver [Generic > PHY] (irq=POLL) It is interesting it is using the generic PHY driver, not the Marvell PHY driver. > mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver [Generic > PHY] (irq=POLL) > mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver > [Marvell 88E1112] (irq=174) > mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY > [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver > [Marvell 88E1112] (irq=175) And here it does use the Marvell PHY driver. Are ports 8 and 9 external, where as the others are internal? Andrew
On 2022-11-15 13:22, Vladimir Oltean wrote: > Do you have a timeline for when the regression was introduced? > Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with > that reverted might be worth a shot. Otherwise, a bisect from a known > working version only takes a couple of hours, and shouldn't require > other changes to the setup. Wow! Reverting 35da1dfd9484 and the problem has disappeared. :-) The bridge_locked_port.sh tests all succeeded... as expected... ;-)
On 2022-11-15 14:21, Andrew Lunn wrote: >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver >> [Generic >> PHY] (irq=POLL) > > It is interesting it is using the generic PHY driver, not the Marvell > PHY driver. > >> mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver >> [Generic >> PHY] (irq=POLL) >> mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver >> [Marvell 88E1112] (irq=174) >> mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY >> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver >> [Marvell 88E1112] (irq=175) > > And here it does use the Marvell PHY driver. Are ports 8 and 9 > external, where as the others are internal? > > Andrew It is an 8 port switch, so the two (8+9) are for internal use, I guess, as I have not had any part in the system design etc of this device. I have it for test and development purposes, incl. upstreaming.
On Tue, Nov 15, 2022 at 02:25:13PM +0100, netdev@kapio-technology.com wrote: > On 2022-11-15 13:22, Vladimir Oltean wrote: > > Do you have a timeline for when the regression was introduced? > > Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with > > that reverted might be worth a shot. Otherwise, a bisect from a known > > working version only takes a couple of hours, and shouldn't require > > other changes to the setup. > > Wow! Reverting 35da1dfd9484 and the problem has disappeared. :-) See? That wasn't very painful, was it. Now, why doesn't that commit work for you? that's the real question. I'm going to say there's a big assumption made there. The old code used to poll up to 16 times with sleeps of up to 2 ms in between. The new code polls until at least 50 ms have elapsed. I can imagine the thought process being something like "hmm, 16*2=32ms, let's round that up to 50 just to be sure". But the effective timeout was not really increased. Rather said, in the old code there was never really an effective timeout, since the polling code could have been preempted many times, and these preemptions would not be accounted against the msleep(2) calls. Whereas the new code really tracks something approximating 50 ms now. Could you please add the reverted patch back to your git tree, and see by how much do you need to increase the timeout for your system to get reliable results? > The bridge_locked_port.sh tests all succeeded... as expected... ;-) Yeah, I confirm this works on a 6390 over here. But I still don't like the log spam from the IRQ handlers. [root@mox:~/.../drivers/net/dsa] # ./bridge_locked_port.sh lan1 lan2 lan3 lan4 [ 1298.218224] mv88e6085 d0032004.mdio-mii:10 lan1: configuring for phy/gmii link mode [ 1299.150249] mv88e6085 d0032004.mdio-mii:10 lan4: configuring for phy/gmii link mode [ 1299.791778] br0: port 1(lan2) entered blocking state [ 1299.800824] br0: port 1(lan2) entered disabled state [ 1300.419596] br0: port 2(lan3) entered blocking state [ 1300.425016] br0: port 2(lan3) entered disabled state [ 1300.527959] device lan3 entered promiscuous mode [ 1300.538124] device lan2 entered promiscuous mode [ 1300.733679] mv88e6085 d0032004.mdio-mii:10 lan2: configuring for phy/gmii link mode [ 1300.765642] 8021q: adding VLAN 0 to HW filter on device lan2 [ 1300.818540] mv88e6085 d0032004.mdio-mii:10 lan3: configuring for phy/gmii link mode [ 1300.855003] 8021q: adding VLAN 0 to HW filter on device lan3 [ 1303.903840] mv88e6085 d0032004.mdio-mii:10 lan3: Link is Up - 1Gbps/Full - flow control rx/tx [ 1303.912939] IPv6: ADDRCONF(NETDEV_CHANGE): lan3: link becomes ready [ 1303.928313] br0: port 2(lan3) entered blocking state [ 1303.933685] br0: port 2(lan3) entered forwarding state [ 1303.948607] mv88e6085 d0032004.mdio-mii:10 lan4: Link is Up - 1Gbps/Full - flow control rx/tx [ 1303.985784] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready [ 1303.999962] IPv6: ADDRCONF(NETDEV_CHANGE): lan4: link becomes ready [ 1304.003780] mv88e6085 d0032004.mdio-mii:10 lan2: Link is Up - 1Gbps/Full - flow control rx/tx [ 1304.017407] IPv6: ADDRCONF(NETDEV_CHANGE): lan2: link becomes ready [ 1304.027922] br0: port 1(lan2) entered blocking state [ 1304.033157] br0: port 1(lan2) entered forwarding state [ 1304.043508] mv88e6085 d0032004.mdio-mii:10 lan1: Link is Up - 1Gbps/Full - flow control rx/tx [ 1304.052601] IPv6: ADDRCONF(NETDEV_CHANGE): lan4.100: link becomes ready [ 1304.158404] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready [ 1304.167574] IPv6: ADDRCONF(NETDEV_CHANGE): lan1.100: link becomes ready TEST: Locked port ipv4 [ OK ] TEST: Locked port ipv6 [ OK ] [ 1337.627010] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 1 callbacks suppressed [ 1337.627042] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1337.644822] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1337.727920] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1337.862053] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1337.956972] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1338.065996] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1338.136905] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1338.238182] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1338.339244] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1338.440106] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1342.655520] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 33 callbacks suppressed [ 1342.655568] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1342.763619] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1342.835264] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1342.847464] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1342.865387] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1342.971661] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1343.075774] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1343.179562] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1343.283426] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1343.387409] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1348.758296] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 24 callbacks suppressed [ 1348.758328] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1348.858879] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1348.990492] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.063977] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.165979] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.268187] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.373827] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.472601] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.573752] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 [ 1349.585837] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9 TEST: Locked port vlan [ OK ] [ 1352.550194] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1352.659486] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1352.792941] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1352.888959] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1352.968150] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1353.072434] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1353.182844] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1353.280595] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1353.384755] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1353.488602] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1358.704139] mv88e6xxx_g1_atu_prob_irq_thread_fn: 39 callbacks suppressed [ 1358.704172] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.280930] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.388609] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.524500] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.620272] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.696597] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.727872] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.801563] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1359.908665] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1360.012063] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1363.858627] mv88e6xxx_g1_atu_prob_irq_thread_fn: 29 callbacks suppressed [ 1363.858674] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1364.879407] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 TEST: Locked port MAB [ OK ] [ 1369.837089] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2 [ 1369.971489] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2 [ 1370.070444] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2 [ 1370.143784] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2 [ 1370.245843] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2 [ 1371.465429] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2 [ 1371.599084] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2 [ 1371.703341] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2 [ 1371.794905] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2 [ 1371.873307] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2 [ 1372.022403] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 TEST: Locked port MAB roam [ OK ] TEST: Locked port MAB configuration [ OK ] [ 1373.039089] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2 [ 1373.060995] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:05 portvec 4 spid 2 [ 1373.167964] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:05 portvec 4 spid 2 [ 1373.296506] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:05 portvec 4 spid 2 TEST: Locked port MAB FDB flush [ OK ] [ 1375.330376] mv88e6085 d0032004.mdio-mii:10 lan3: Link is Down [ 1375.341372] br0: port 2(lan3) entered disabled state [ 1375.461136] mv88e6085 d0032004.mdio-mii:10 lan2: Link is Down [ 1375.489476] br0: port 1(lan2) entered disabled state [ 1375.611159] device lan3 left promiscuous mode [ 1375.618253] br0: port 2(lan3) entered disabled state [ 1375.737909] device lan2 left promiscuous mode [ 1375.745305] br0: port 1(lan2) entered disabled state
On 2022-11-15 15:56, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 02:25:13PM +0100, netdev@kapio-technology.com > wrote: >> On 2022-11-15 13:22, Vladimir Oltean wrote: > >> The bridge_locked_port.sh tests all succeeded... as expected... ;-) > > Yeah, I confirm this works on a 6390 over here. Thanks :-) > But I still don't like > the log spam from the IRQ handlers. > > [root@mox:~/.../drivers/net/dsa] # ./bridge_locked_port.sh lan1 lan2 > lan3 lan4 > [ 1298.218224] mv88e6085 d0032004.mdio-mii:10 lan1: configuring for > ... I think the violation log issue should be handled in a seperate patch(set)?
On 2022-11-15 15:56, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 02:25:13PM +0100, netdev@kapio-technology.com > wrote: >> On 2022-11-15 13:22, Vladimir Oltean wrote: >> > Do you have a timeline for when the regression was introduced? >> > Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with >> > that reverted might be worth a shot. Otherwise, a bisect from a known >> > working version only takes a couple of hours, and shouldn't require >> > other changes to the setup. >> >> Wow! Reverting 35da1dfd9484 and the problem has disappeared. :-) > > See? That wasn't very painful, was it. Indeed it was not, when you get a good tip. Thanks alot! :-) > > Now, why doesn't that commit work for you? that's the real question. > I'm going to say there's a big assumption made there. The old code used > to poll up to 16 times with sleeps of up to 2 ms in between. > The new code polls until at least 50 ms have elapsed. > I can imagine the thought process being something like "hmm, 16*2=32ms, > let's round that up to 50 just to be sure". But the effective timeout > was not really increased. Rather said, in the old code there was never > really an effective timeout, since the polling code could have been > preempted many times, and these preemptions would not be accounted > against the msleep(2) calls. Whereas the new code really tracks > something approximating 50 ms now. > > Could you please add the reverted patch back to your git tree, and see > by how much do you need to increase the timeout for your system to get > reliable results? > Yes, so you want me to simply increase the 50ms on line 58 in smi.c... I have now tried to increase it even to 10000ms == 10s and it didn't help, so something else must be needed...
On Tue, Nov 15, 2022 at 04:14:05PM +0100, netdev@kapio-technology.com wrote:
> I think the violation log issue should be handled in a seperate patch(set)?
idk, what do you plan to do about it?
On Tue, Nov 15, 2022 at 05:03:01PM +0100, netdev@kapio-technology.com wrote: > Yes, so you want me to simply increase the 50ms on line 58 in smi.c... > > I have now tried to increase it even to 10000ms == 10s and it didn't help, > so something else must be needed... Not only that, but also the one in mv88e6xxx_wait_mask().
On 2022-11-15 17:15, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 04:14:05PM +0100, netdev@kapio-technology.com > wrote: >> I think the violation log issue should be handled in a seperate >> patch(set)? > > idk, what do you plan to do about it? When I think about it, I think that the messages should be disabled by default and like one enables NO_LL_LEARN (echo 1 >/sys/class...), they can be activated if one needs it. And of course the ethtool stats will still be there anyhow... How does that sound?
On Tue, Nov 15, 2022 at 06:11:08PM +0100, netdev@kapio-technology.com wrote: > On 2022-11-15 17:15, Vladimir Oltean wrote: > > On Tue, Nov 15, 2022 at 04:14:05PM +0100, netdev@kapio-technology.com > > wrote: > > > I think the violation log issue should be handled in a seperate > > > patch(set)? > > > > idk, what do you plan to do about it? > > When I think about it, I think that the messages should be disabled by default > and like one enables NO_LL_LEARN (echo 1 >/sys/class...), they can be activated > if one needs it. And of course the ethtool stats will still be there anyhow... > > How does that sound? Just make them trace points... If you don't know how to do that, just search for who has "#define TRACE_SYSTEM" in drivers/net/ethernet/ and steal from them...
On 2022-11-15 17:18, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 05:03:01PM +0100, netdev@kapio-technology.com > wrote: >> Yes, so you want me to simply increase the 50ms on line 58 in smi.c... >> >> I have now tried to increase it even to 10000ms == 10s and it didn't >> help, >> so something else must be needed... > > Not only that, but also the one in mv88e6xxx_wait_mask(). So, I will not present you with a graph as it is a tedious process (probably it is some descending gaussian curve wrt timeout occurring). But 100ms fails, 125 I had 1 port fail, at 140, 150 and 180 I saw timeouts resulting in fdb add fails, like (and occasional port fail): mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch mv88e6085 1002b000.ethernet-1:04: port 0 failed to add be:7c:96:06:9f:09 vid 1 to fdb: -110 At around 200 ms it looks like it is getting stable (like 5 runs, no problems). So with the gaussian curve tail whipping ones behind (risque of failure) it might need to be like 300 ms in my case... :-)
On Tue, Nov 15, 2022 at 07:40:02PM +0100, netdev@kapio-technology.com wrote: > So, I will not present you with a graph as it is a tedious process (probably > it is some descending gaussian curve wrt timeout occurring). > > But 100ms fails, 125 I had 1 port fail, at 140, 150 and 180 I saw timeouts > resulting in fdb add fails, like (and occasional port fail): > > mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch > mv88e6085 1002b000.ethernet-1:04: port 0 failed to add be:7c:96:06:9f:09 vid > 1 to fdb: -110 > > At around 200 ms it looks like it is getting stable (like 5 runs, no > problems). > > So with the gaussian curve tail whipping ones behind (risque of failure) it > might need to be like 300 ms in my case... :-) Pick a value that is high enough to be reliable and submit a patch to "net" where you present the evidence for it (top-level MDIO controller, SoC, switch, kernel). I don't believe there's much to read into. A large timeout shouldn't have a negative effect on the MDIO performance, because it just determines how long it takes until the kernel declares it dead, rather than how long it takes for transactions to actually take place.
> Pick a value that is high enough to be reliable and submit a patch to > "net" where you present the evidence for it (top-level MDIO controller, > SoC, switch, kernel). I don't believe there's much to read into. A large > timeout shouldn't have a negative effect on the MDIO performance, > because it just determines how long it takes until the kernel declares > it dead, rather than how long it takes for transactions to actually take > place. Yes, please do that. It is interesting that you found this. I'm just curious, so no need to investigate if you don't have time. Is there a pattern, is it the same register which always times out? Andrew
On 2022-11-16 11:24, Vladimir Oltean wrote: > On Tue, Nov 15, 2022 at 07:40:02PM +0100, netdev@kapio-technology.com > wrote: >> So, I will not present you with a graph as it is a tedious process >> (probably >> it is some descending gaussian curve wrt timeout occurring). >> >> But 100ms fails, 125 I had 1 port fail, at 140, 150 and 180 I saw >> timeouts >> resulting in fdb add fails, like (and occasional port fail): >> >> mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch >> mv88e6085 1002b000.ethernet-1:04: port 0 failed to add >> be:7c:96:06:9f:09 vid >> 1 to fdb: -110 >> >> At around 200 ms it looks like it is getting stable (like 5 runs, no >> problems). >> >> So with the gaussian curve tail whipping ones behind (risque of >> failure) it >> might need to be like 300 ms in my case... :-) > > Pick a value that is high enough to be reliable and submit a patch to > "net" where you present the evidence for it (top-level MDIO controller, > SoC, switch, kernel). I don't believe there's much to read into. A > large > timeout shouldn't have a negative effect on the MDIO performance, > because it just determines how long it takes until the kernel declares > it dead, rather than how long it takes for transactions to actually > take > place. Would it not be appropriate to have a define that specifies the value instead of the same value two places as it is now? And in so case, what would be an appropriate name?
> Would it not be appropriate to have a define that specifies the > value instead of the same value two places as it is now? Yes. > And in so case, what would be an appropriate name? MV88E6XXX_WAIT_TIMEOUT_MS ? Andrew