Message ID | CAJq09z5FCgG-+jVT7uxh1a-0CiiFsoKoHYsAWJtiKwv7LXKofQ@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: dsa: realtek: silent indirect reg read failures on SMP | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply |
Hi Luiz, Thanks very much for the bug report! Luiz Angelo Daros de Luca <luizluca@gmail.com> writes: > Hello, > > Arinç reported an issue with interfaces going down and up in a fixed > interval (a little less than 40s). It happened with both SMI or MDIO > interfaces for two different rtl8365mb supported models. Is this using the Generic PHY driver or the Realtek RTL8365MB-VC PHY driver? If it's the latter then I would be interested to know whether the interrupt counters are going up too. Maybe the switch really has something to say? I think the Generic PHY driver is based on polling of PHY registers, which seems more likely to trigger the issue you describe. In any case, I think I can reproduce your issue. Using an SMI setup with just one cable attached to port 2, I did the following: 1. Enable ftrace event logging for regmap and mdio 2. Poll BSMR PHY register for my connected port; it should always read the same (0x79ed). 3. Wait for 2 to read out something else. So I used this command: while true; do phytool read swp2/2/0x01; sleep 0.1; done After a couple of seconds I already read out 0x0000. And after checking the ftrace log I can see that this is because of some interleaved register access: kworker/3:4-70 [003] ....... 1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd phytool-16816 [002] ....... 1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0 kworker/3:4-70 [003] ....... 1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0 phytool-16816 [002] ....... 1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69 kworker/3:4-70 [003] ....... 1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0 phytool-16816 [002] ....... 1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041 kworker/3:4-70 [003] ....... 1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0 phytool-16816 [002] ....... 1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1 kworker/3:4-70 [003] ....... 1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be phytool-16816 [002] ....... 1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0 kworker/3:4-70 [003] ....... 1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0 phytool-16816 [002] ....... 1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0 phytool-16816 [002] ....... 1927.142641: mdio_access: SMI-0 read phy:0x02 reg:0x01 val:0x0000 kworker/3:4-70 [003] ....... 1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0 kworker/3:4-70 [003] ....... 1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89 kworker/3:4-70 [003] ....... 1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be kworker/3:4-70 [003] ....... 1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0 kworker/3:4-70 [003] ....... 1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0 kworker/3:4-70 [003] ....... 1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6 The kworker here is polling MIB counters for stats, as evidenced by the register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). Sometimes I can even read out completely junk values if I tighten my phytool loop: # while true; do phytool read swp2/2/0x01; done | uniq 0x79ed 0000 0x79ed 0000 0x79ed 0000 0x79ed 0x0014 <-- ?! This is just a preliminary investigation - I only checked your mail today - but I will follow up on this next week and try and make a fix. In the mean time could you and Arinç perhaps do some similar tracing and confirm that in your case it is a similar issue? You needn't use phytool if the issue reproduces itself every 40 seconds - in that case just enable ftrace events and inspect the register access sequence when a PHY read gives an unexpected value. Let's please restrict the investigation to registers which we expect to give well-defined values though - unused or otherwise inactive registers may very well give us junk, but it is harder to argue that this is a bug. Anyway, this seems pretty easy to reproduce, so I don't think you need to go down that path :-) Overall we probably need some stricter locking but I want to spend some time on this before just sending a big-fat-lock solution to the list. I didn't try your patch yet due to other items in my inbox today. Kind regards, Alvin > > The indirect read procedure is something like this: > > rtl8365mb_phy_read > rtl8365mb_phy_ocp_read > rtl8365mb_phy_poll_busy > regmap_read_poll_timeout(RTL8365MB_INDIRECT_ACCESS_STATUS_REG) > rtl8365mb_phy_ocp_prepare > regmap_update_bits(RTL8365MB_GPHY_OCP_MSB_0_REG) > regmap_write(RTL8365MB_INDIRECT_ACCESS_ADDRESS_REG) > regmap_write(RTL8365MB_INDIRECT_ACCESS_CTRL_REG) > rtl8365mb_phy_poll_busy > regmap_read_poll_timeout(RTL8365MB_INDIRECT_ACCESS_STATUS_REG) > regmap_read(RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG) > > And the write is similar. The regmap read/write does have its own > mutex. realtek-mdio does use a sequence of read/writes to mdio for > each read/write ops but beyond regmap internal locks, I also lock > bus->mdio_lock. > > In a non-SMP system or with extra cores disabled, everything works as > expected. However, with an SMP system, there is some kind of > concurrent access to registers and > regmap_read(RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG) will eventually > read 0x0000. It is like the chip didn't have time to update that > register or got lost in the way. It only happens when the system is > idle. If you stress the CPU, the simultaneous register access gets > unsynchronized or serialized. > > I can "simulate" a similar issue in a non-SMP system by dumping regmap > from non-used registers like 0x20a0 to 0x20ff (0x20a0 is related to > port 5 state, but I have 0,1,2,3,4 + 6,7). There are other regions > that cause the same behavior but [0x20a0,0x20ff] is the first one I > found. It never failed while reading good memory regions. Maybe it is > just caused by another issue like the chip getting confused while > reading undefined memory regions. > > for a in $(seq 1000); do dd > if=/sys/kernel/debug/regmap/mdio-bus:1d/registers bs=11 > skip=$((0x20a0)) count=$((0x1)) 2>/dev/null; done > > (That range might not be precise as I don't know if dd is enough to > select only a specific range) > > While the script is running, I can see both the driver and the debugfs > dump flicking between good data, garbage and zero. When DSA reads > 0x0000, it brings the port down and up. > > I tried to add a simple mutex over rtl8365mb_phy_ocp_read call but it > still failed to prevent the event in a SMP system (it would not > protect the debugfs dump). Maybe I'm missing something obvious. > > I also wonder if all those functions that use a sequence of > regmap_{read,write} might need a mutex to protect from parallel > execution. I didn't find a way to create a "regmap transaction" by > locking it externally as its lock control is private as well as the > non-locked _regmap_read(). I could disable its lock and do it inside > the driver, but it would also disable the debugfs dump and add a bunch > of new code. And I don't even know if it would fix the issue. > Is there a better way to deal with this issue? Maybe force dsa polling > to use a single processor? > > This is the failed patch. > > diff --git a/drivers/net/dsa/realtek/rtl8365mb.c > b/drivers/net/dsa/realtek/rtl8365mb.c > index 1a0562811c1a..5572271a2514 100644 > --- a/drivers/net/dsa/realtek/rtl8365mb.c > +++ b/drivers/net/dsa/realtek/rtl8365mb.c > @@ -817,6 +817,7 @@ struct rtl8365mb_port { > * @port_mask: mask of all ports > * @learn_limit_max: maximum number of L2 addresses the chip can learn > * @mib_lock: prevent concurrent reads of MIB counters > + * @indirect_reg_lock: prevent concurrent access of indirect registers > * @ports: per-port data > * @jam_table: chip-specific initialization jam table > * @jam_size: size of the chip's jam table > @@ -832,6 +833,7 @@ struct rtl8365mb { > u32 port_mask; > u32 learn_limit_max; > struct mutex mib_lock; > + struct mutex indirect_reg_lock; > struct rtl8365mb_port ports[RTL8365MB_MAX_NUM_PORTS]; > const struct rtl8365mb_jam_tbl_entry *jam_table; > size_t jam_size; > @@ -989,6 +991,7 @@ static int rtl8365mb_phy_ocp_write(struct > realtek_priv *priv, int phy, > > static int rtl8365mb_phy_read(struct realtek_priv *priv, int phy, int regnum) > { > + struct rtl8365mb *mb = priv->chip_data; > u32 ocp_addr; > u16 val; > int ret; > @@ -1001,16 +1004,24 @@ static int rtl8365mb_phy_read(struct > realtek_priv *priv, int phy, int regnum) > > ocp_addr = RTL8365MB_PHY_OCP_ADDR_PHYREG_BASE + regnum * 2; > > - ret = rtl8365mb_phy_ocp_read(priv, phy, ocp_addr, &val); > + ret = mutex_lock_interruptible(&mb->indirect_reg_lock); > if (ret) { > dev_err(priv->dev, > - "failed to read PHY%d reg %02x @ %04x, ret %d\n", phy, > + "read PHY%d reg %02x @ %04x, ret %d interrupted\n", phy, > regnum, ocp_addr, ret); > return ret; > } > > - dev_dbg(priv->dev, "read PHY%d register 0x%02x @ %04x, val <- %04x\n", > - phy, regnum, ocp_addr, val); > + ret = rtl8365mb_phy_ocp_read(priv, phy, ocp_addr, &val); > + if (ret) > + dev_err(priv->dev, > + "failed to read PHY%d reg %02x @ %04x, ret %d\n", phy, > + regnum, ocp_addr, ret); > + else > + dev_dbg(priv->dev, "read PHY%d register 0x%02x @ %04x, > val <- %04x\n", > + phy, regnum, ocp_addr, val); > + > + mutex_unlock(&mb->indirect_reg_lock); > > return val; > } > @@ -1018,6 +1029,7 @@ static int rtl8365mb_phy_read(struct > realtek_priv *priv, int phy, int regnum) > static int rtl8365mb_phy_write(struct realtek_priv *priv, int phy, int regnum, > u16 val) > { > + struct rtl8365mb *mb = priv->chip_data; > u32 ocp_addr; > int ret; > > @@ -1029,18 +1041,26 @@ static int rtl8365mb_phy_write(struct > realtek_priv *priv, int phy, int regnum, > > ocp_addr = RTL8365MB_PHY_OCP_ADDR_PHYREG_BASE + regnum * 2; > > - ret = rtl8365mb_phy_ocp_write(priv, phy, ocp_addr, val); > + ret = mutex_lock_interruptible(&mb->indirect_reg_lock); > if (ret) { > dev_err(priv->dev, > - "failed to write PHY%d reg %02x @ %04x, ret %d\n", phy, > + "write PHY%d reg %02x @ %04x, ret %d > interrupted\n", phy, > regnum, ocp_addr, ret); > return ret; > } > > - dev_dbg(priv->dev, "write PHY%d register 0x%02x @ %04x, val -> %04x\n", > - phy, regnum, ocp_addr, val); > + ret = rtl8365mb_phy_ocp_write(priv, phy, ocp_addr, val); > + if (ret) > + dev_err(priv->dev, > + "failed to write PHY%d reg %02x @ %04x, ret %d\n", phy, > + regnum, ocp_addr, ret); > + else > + dev_dbg(priv->dev, "write PHY%d register 0x%02x @ > %04x, val -> %04x\n", > + phy, regnum, ocp_addr, val); > > - return 0; > + mutex_unlock(&mb->indirect_reg_lock); > + > + return ret; > } > > static int rtl8365mb_dsa_phy_read(struct dsa_switch *ds, int phy, int regnum) > @@ -2215,6 +2235,8 @@ static int rtl8365mb_setup(struct dsa_switch *ds) > > mb = priv->chip_data; > > + mutex_init(&mb->indirect_reg_lock); > + > ret = rtl8365mb_reset_chip(priv); > if (ret) { > dev_err(priv->dev, "failed to reset chip: %d\n", ret); > -- > 2.34.1 > > > --- > Luiz
' > Hi Luiz, Hi Alvin, > Thanks very much for the bug report! Thanks Alvin! > > Arinç reported an issue with interfaces going down and up in a fixed > > interval (a little less than 40s). It happened with both SMI or MDIO > > interfaces for two different rtl8365mb supported models. > > Is this using the Generic PHY driver or the Realtek RTL8365MB-VC PHY > driver? If it's the latter then I would be interested to know whether > the interrupt counters are going up too. Maybe the switch really has > something to say? Both are using 'Realtek RTL8365MB-VC PHY' but without interruptions. In the mdio-connected device, the interrupt pin is not connected yet (I suggested them to use it). The other one is unknown if the pin is connected to a gpio port. I suggested Arunç to export all available GPIO ports and check their values with the interruption configured as both active high and low. That's how I found the interrupt gpio in my device. So, all affected SMP devices are using polling. > I think the Generic PHY driver is based on polling of PHY registers, > which seems more likely to trigger the issue you describe. I didn't even know that 'Generic PHY' would work until yesterday when I noticed I was missing the patch that added the phy_id. Anyway, as I said, both drivers will work with polling while only 'RTL8365MB-VC PHY' can handle interruptions. It is not a good indicator whether it is using interruption or not. And MDIO also didn't support interruption without a new patch I just sent. https://patchwork.kernel.org/project/netdevbpf/patch/20220209224538.9028-1-luizluca@gmail.com/ > In any case, I think I can reproduce your issue. Using an SMI setup with > just one cable attached to port 2, I did the following: > > 1. Enable ftrace event logging for regmap and mdio > 2. Poll BSMR PHY register for my connected port; it should always read > the same (0x79ed). > 3. Wait for 2 to read out something else. > > So I used this command: > > while true; do phytool read swp2/2/0x01; sleep 0.1; done I'll test it and report. > > After a couple of seconds I already read out 0x0000. And after checking > the ftrace log I can see that this is because of some interleaved > register access: > > kworker/3:4-70 [003] ....... 1927.139849: regmap_reg_write: ethernet-switch reg=1004 val=bd > phytool-16816 [002] ....... 1927.139979: regmap_reg_read: ethernet-switch reg=1f01 val=0 > kworker/3:4-70 [003] ....... 1927.140381: regmap_reg_read: ethernet-switch reg=1005 val=0 > phytool-16816 [002] ....... 1927.140468: regmap_reg_read: ethernet-switch reg=1d15 val=a69 > kworker/3:4-70 [003] ....... 1927.140864: regmap_reg_read: ethernet-switch reg=1003 val=0 > phytool-16816 [002] ....... 1927.140955: regmap_reg_write: ethernet-switch reg=1f02 val=2041 > kworker/3:4-70 [003] ....... 1927.141390: regmap_reg_read: ethernet-switch reg=1002 val=0 > phytool-16816 [002] ....... 1927.141479: regmap_reg_write: ethernet-switch reg=1f00 val=1 > kworker/3:4-70 [003] ....... 1927.142311: regmap_reg_write: ethernet-switch reg=1004 val=be > phytool-16816 [002] ....... 1927.142410: regmap_reg_read: ethernet-switch reg=1f01 val=0 > kworker/3:4-70 [003] ....... 1927.142534: regmap_reg_read: ethernet-switch reg=1005 val=0 > phytool-16816 [002] ....... 1927.142618: regmap_reg_read: ethernet-switch reg=1f04 val=0 > phytool-16816 [002] ....... 1927.142641: mdio_access: SMI-0 read phy:0x02 reg:0x01 val:0x0000 > kworker/3:4-70 [003] ....... 1927.143037: regmap_reg_read: ethernet-switch reg=1001 val=0 > kworker/3:4-70 [003] ....... 1927.143133: regmap_reg_read: ethernet-switch reg=1000 val=2d89 > kworker/3:4-70 [003] ....... 1927.143213: regmap_reg_write: ethernet-switch reg=1004 val=be > kworker/3:4-70 [003] ....... 1927.143291: regmap_reg_read: ethernet-switch reg=1005 val=0 > kworker/3:4-70 [003] ....... 1927.143368: regmap_reg_read: ethernet-switch reg=1003 val=0 > kworker/3:4-70 [003] ....... 1927.143443: regmap_reg_read: ethernet-switch reg=1002 val=6 > > The kworker here is polling MIB counters for stats, as evidenced by the > register 0x1004 that we are writing to (RTL8365MB_MIB_ADDRESS_REG). At first I thought it was just two threads reading different indirect registers. And there are really more than one kernel threads reading port status in a SMP device. If you disable the extra cores, it was working as expected. So, I added a mutex to serialize that sequence. But it didn't fix the issue. It looks like something deep down is broken. > Sometimes I can even read out completely junk values if I tighten my > phytool loop: > > # while true; do phytool read swp2/2/0x01; done | uniq > 0x79ed > 0000 > 0x79ed > 0000 > 0x79ed > 0000 > 0x79ed > 0x0014 <-- ?! Yeah, just like what happens if you read that unused register. You generally get 0x0000 instead of the good data but sometimes random data. I also added multiple udelay over and inside the regmap_read with no success (while reading the unused register). I might test again with phytool. > This is just a preliminary investigation - I only checked your mail > today - but I will follow up on this next week and try and make a > fix. In the mean time could you and Arinç perhaps do some similar > tracing and confirm that in your case it is a similar issue? You needn't > use phytool if the issue reproduces itself every 40 seconds - in that > case just enable ftrace events and inspect the register access sequence > when a PHY read gives an unexpected value. I'll try to enable ftrace but when I added a couple of printk, the issue was gone. Let's see what my device can handle. The 40s is not a global cycle when everything falls apart. I added a simple test that printed an error when the expected value wasn't right. And each time a different port was failing in a fixed but unsynchronized rhythm. Even with disconnected ports. It looks like the switch "internal job queue" overflows and we get the null or random data. And that gives the switch a little time to process the queue. After that, the queue fills up again and we have the same behavior. Accessing the unused register might simply consume a lot of "extra loops". It looks like an exaggerated variant of the same issue. Or not. > Let's please restrict the investigation to registers which we expect to > give well-defined values though - unused or otherwise inactive registers > may very well give us junk, but it is harder to argue that this is a > bug. Anyway, this seems pretty easy to reproduce, so I don't think you > need to go down that path :-) I also believe that we need to stick to registers that should return good values. But while the indirect registers are failing, you can also dump some other switch registers and it should also return some junk data. I didn't test it but I believe that's what will happen. > Overall we probably need some stricter locking but I want to spend some > time on this before just sending a big-fat-lock solution to the > list. I didn't try your patch yet due to other items in my inbox today. If nothing serializes the access to the indirect registers, the lock is needed. Your traces show that two reads will collide. The regmap also has its own lock by default and I also lock the mdio during a regmap_read/write. I don't believe an extra lock will do. I guess we'll need to slow the regmap access, although it wasn't enough for me. Maybe the switch is yelling an interruption we are ignoring. This is another interesting thing to check. --- Luiz
> > So I used this command: > > > > while true; do phytool read swp2/2/0x01; sleep 0.1; done > > I'll test it and report. In my single processor device, I got stable readings: root@OpenWrt:/# while true; do phytool read lan1/2/0x01; done | time awk '!a[$1] ++' 0x79c9 ^CCommand terminated by signal 2 real 4m 56.44s user 0m 1.61s sys 0m 0.72s And most of the time I was executing three copies of that loop in parallel. I tried with and without my patches. All of them resulted in the same behavior. Sorry but I believe my non-SMP device is not "affected enough" to help debug this issue. Is your device SMP? --- Luiz
Hi Luiz, Thanks for the info. Luiz Angelo Daros de Luca <luizluca@gmail.com> writes: >> > So I used this command: >> > >> > while true; do phytool read swp2/2/0x01; sleep 0.1; done >> >> I'll test it and report. > > In my single processor device, I got stable readings: > > root@OpenWrt:/# while true; do phytool read lan1/2/0x01; done | time awk '!a[$1] > ++' > 0x79c9 > ^CCommand terminated by signal 2 > real 4m 56.44s > user 0m 1.61s > sys 0m 0.72s > > And most of the time I was executing three copies of that loop in parallel. > I tried with and without my patches. All of them resulted in the same behavior. > > Sorry but I believe my non-SMP device is not "affected enough" to help > debug this issue. Is your device SMP? Yes, it's SMP and also with PREEMPT_RT ;-) But no problem, I will look into this and get back to you in this thread if I need any more feedback. Thanks for your help. Kind regards, Alvin > > --- > Luiz
Hi again Luiz, Alvin Šipraga <ALSI@bang-olufsen.dk> writes: > Hi Luiz, > > Thanks for the info. > > Luiz Angelo Daros de Luca <luizluca@gmail.com> writes: > <snip> >> >> Sorry but I believe my non-SMP device is not "affected enough" to help >> debug this issue. Is your device SMP? > > Yes, it's SMP and also with PREEMPT_RT ;-) > > But no problem, I will look into this and get back to you in this thread > if I need any more feedback. Thanks for your help. So I had a closer look this afternoon. First some general statements. 1. Generally the switch driver is not accessing registers unless an administrator is reconfiguring the interfaces 2. The exception to the above rule is the delayed_work for stats64, which you can see in the code is there because get_stats64 runs in atomic context. This runs every 3 seconds. 3. Without the interrupt from the switch enabled, the PHY subsystem resorts to polling the PHY registers (every 1 second I think you said). As a result of (2) and (3) above, you are bound at some point to have both the stats work and the PHY register read callbacks executing in parallel. As I mentioned in my last email, a simple busy loop with phytool would return some nonsense pretty quickly. On my SMP/PREEMPT_RT system this happens every 3 seconds while everything else is idle. I tried to disable the stats delayed_work just to see, and in this case I did not observe any PHY read issues. The PHY register value was always as expected. In that setup I then tried to provoke the error again, this time by reading a single register with dd via regmap debugfs. And while it's unlikely for a single such read to interfere with my busy phytool loop, putting the dd read in a tight loop almost immediately provokes the same bug. This time I noticed that the value returned by phytool is the same value that I read out with dd from the other non-PHY-related register. In general what I found is that if we read from an arbitrary register A and this read coincides with the multi-register access in rtl8365mb_phy_ocp_read, then the final read from RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG will always return the value in register A. It is quite reliably the case in all my testing, and it explains the nonsense values we sometimes happened to see during PHY register access, because of the stats work going on in the background. Probably we got some MIB counter data and this corrupted the PHY register value held in the INDIRECT_ACCESS_READ_DATA_REG register. I am not sure why this happens - likely some peculiarity of the ASIC hardware - but I wanted to check if this is also the case for the MIB register access, because we also have a kind of indirect lookup algorithm there. But in that case I did not see any corruption of the data in the MIB counter data registers (RTL8365MB_MIB_COUNTER_REG(_x)). So my conclusion is that this problem is unique to the indirect PHY register access pattern. It should be pointed out that the regmap is already protected by a lock, so I don't expect to see any weird data races for non-PHY register access. One more thing I wanted to point out: you mentioned that on your system you conducted multiple phytool read loops and did not observe any issues. I think this is easily explained by some higher-level locking in the kernel which prevents concurrent PHY register reads. **** With all of that said, I think the solution here is simply to guard against stray register access while we are in the indirect PHY register access callbacks. You also posted a patch to forego the whole indirect access method for MDIO-connected switches, and I think that is also a good thing. My reply to that patch was just taking issue with your explanation, both because the diagnosis of the bug was rather nebulous, and also because it did not actually fix the bug - it just worked around it. I will take it upon myself to fix this issue of indirect PHY register access yielding corrupt values and post a patch to the list next week. I already have a quick-n-dirty change which ensures proper locking and now I cannot reproduce the issue for several hours. In the mean time, could you resend your MDIO direct-PHY-register-access patch and I will give it one more review. Please do not suggest that it is a fix for this bug (because it's not) -- better yet, just add a Link: to this thread and explain why you bothered implementing it to begin with. You can mention that the issue is not seen with direct-access (which also corroborates our findings here). Then I will base my changes on your patch. Alternatively you can drop the patch and we can just fix the indirect access wholesale - both for SMI and MDIO. That would mean adding less code (since MDIO with indirect access also works), albeit at the expense of some technically unnecessary gymnastics in the driver (since MDIO direct access is simpler). But I'll leave that up to you :-) What do you think? Kind regards, Alvin
> Hi again Luiz, Hello, > >> > >> Sorry but I believe my non-SMP device is not "affected enough" to help > >> debug this issue. Is your device SMP? > > > > Yes, it's SMP and also with PREEMPT_RT ;-) > > > > But no problem, I will look into this and get back to you in this thread > > if I need any more feedback. Thanks for your help. > > So I had a closer look this afternoon. First some general statements. > > 1. Generally the switch driver is not accessing registers unless an > administrator is reconfiguring the interfaces > 2. The exception to the above rule is the delayed_work for stats64, > which you can see in the code is there because get_stats64 runs in > atomic context. This runs every 3 seconds. > 3. Without the interrupt from the switch enabled, the PHY subsystem > resorts to polling the PHY registers (every 1 second I think you > said). > > As a result of (2) and (3) above, you are bound at some point to have > both the stats work and the PHY register read callbacks executing in > parallel. As I mentioned in my last email, a simple busy loop with > phytool would return some nonsense pretty quickly. On my SMP/PREEMPT_RT > system this happens every 3 seconds while everything else is idle. > > I tried to disable the stats delayed_work just to see, and in this case > I did not observe any PHY read issues. The PHY register value was always > as expected. > > In that setup I then tried to provoke the error again, this time by > reading a single register with dd via regmap debugfs. And while it's > unlikely for a single such read to interfere with my busy phytool loop, > putting the dd read in a tight loop almost immediately provokes the same > bug. This time I noticed that the value returned by phytool is the same > value that I read out with dd from the other non-PHY-related register. > > In general what I found is that if we read from an arbitrary register A > and this read coincides with the multi-register access in > rtl8365mb_phy_ocp_read, then the final read from > RTL8365MB_INDIRECT_ACCESS_READ_DATA_REG will always return the value in > register A. It is quite reliably the case in all my testing, and it > explains the nonsense values we sometimes happened to see during PHY > register access, because of the stats work going on in the > background. Probably we got some MIB counter data and this corrupted the > PHY register value held in the INDIRECT_ACCESS_READ_DATA_REG register. It makes sense. That explains why a simple lock over indirect access did not solve the issue. > I am not sure why this happens - likely some peculiarity of the ASIC > hardware - but I wanted to check if this is also the case for the MIB > register access, because we also have a kind of indirect lookup > algorithm there. But in that case I did not see any corruption of the > data in the MIB counter data registers (RTL8365MB_MIB_COUNTER_REG(_x)). > > So my conclusion is that this problem is unique to the indirect PHY > register access pattern. It should be pointed out that the regmap is > already protected by a lock, so I don't expect to see any weird data > races for non-PHY register access. > > > One more thing I wanted to point out: you mentioned that on your system > you conducted multiple phytool read loops and did not observe any > issues. I think this is easily explained by some higher-level locking > in the kernel which prevents concurrent PHY register reads. > > **** > > With all of that said, I think the solution here is simply to guard > against stray register access while we are in the indirect PHY register > access callbacks. You also posted a patch to forego the whole indirect > access method for MDIO-connected switches, and I think that is also a > good thing. My reply to that patch was just taking issue with your > explanation, both because the diagnosis of the bug was rather nebulous, > and also because it did not actually fix the bug - it just worked around > it. > > I will take it upon myself to fix this issue of indirect PHY register > access yielding corrupt values and post a patch to the list next week. > I already have a quick-n-dirty change which ensures proper locking and > now I cannot reproduce the issue for several hours. I tried to find a way to lock regmap while permitting it to be used only by the indirect reg access. However, I failed to find a clean solution. It's great you got a proper fix. > In the mean time, could you resend your MDIO direct-PHY-register-access > patch and I will give it one more review. Please do not suggest that it > is a fix for this bug (because it's not) -- better yet, just add a Link: > to this thread and explain why you bothered implementing it to begin > with. You can mention that the issue is not seen with direct-access > (which also corroborates our findings here). Then I will base my changes > on your patch. > > Alternatively you can drop the patch and we can just fix the indirect > access wholesale - both for SMI and MDIO. That would mean adding less > code (since MDIO with indirect access also works), albeit at the expense > of some technically unnecessary gymnastics in the driver (since MDIO > direct access is simpler). But I'll leave that up to you :-) > > What do you think? > I would prefer to drop it if we get a shared fix. There is an ongoing discussion that might allow us to drop the realtek-smi internal mdio and share phy_read/write between both interfaces. In that case, that different mdio code path will fit much better as an "if" inside those functions. > Kind regards, > Alvin Regards, Luiz
Hi Luiz, Luiz Angelo Daros de Luca <luizluca@gmail.com> writes: > > Hello, <snip> > I would prefer to drop it if we get a shared fix. There is an ongoing > discussion that might allow us to drop the realtek-smi internal mdio > and share phy_read/write between both interfaces. In that case, that > different mdio code path will fit much better as an "if" inside those > functions. OK, sounds good! Kind regards, Alvin
diff --git a/drivers/net/dsa/realtek/rtl8365mb.c b/drivers/net/dsa/realtek/rtl8365mb.c index 1a0562811c1a..5572271a2514 100644 --- a/drivers/net/dsa/realtek/rtl8365mb.c +++ b/drivers/net/dsa/realtek/rtl8365mb.c @@ -817,6 +817,7 @@ struct rtl8365mb_port { * @port_mask: mask of all ports * @learn_limit_max: maximum number of L2 addresses the chip can learn * @mib_lock: prevent concurrent reads of MIB counters + * @indirect_reg_lock: prevent concurrent access of indirect registers * @ports: per-port data * @jam_table: chip-specific initialization jam table * @jam_size: size of the chip's jam table @@ -832,6 +833,7 @@ struct rtl8365mb { u32 port_mask; u32 learn_limit_max; struct mutex mib_lock; + struct mutex indirect_reg_lock; struct rtl8365mb_port ports[RTL8365MB_MAX_NUM_PORTS]; const struct rtl8365mb_jam_tbl_entry *jam_table; size_t jam_size; @@ -989,6 +991,7 @@ static int rtl8365mb_phy_ocp_write(struct realtek_priv *priv, int phy, static int rtl8365mb_phy_read(struct realtek_priv *priv, int phy, int regnum) { + struct rtl8365mb *mb = priv->chip_data; u32 ocp_addr; u16 val;