Message ID | 20201209184740.16473-1-w@1wt.eu (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Revert "macb: support the two tx descriptors on at91rm9200" | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 57 this patch: 57 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 102 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 57 this patch: 57 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
From: Willy Tarreau <w@1wt.eu> Date: Wed, 9 Dec 2020 19:47:40 +0100 > This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb. > > The code was developed and tested on an MSC313E SoC, which seems to be > half-way between the AT91RM9200 and the AT91SAM9260 in that it supports > both the 2-descriptors mode and a Tx ring. > > It turns out that after the code was merged I could notice that the > controller would sometimes lock up, and only when dealing with sustained > bidirectional transfers, in which case it would report a Tx overrun > condition right after having reported being ready, and will stop sending > even after the status is cleared (a down/up cycle fixes it though). > > After adding lots of traces I couldn't spot a sequence pattern allowing > to predict that this situation would happen. The chip comes with no > documentation and other bits are often reported with no conclusive > pattern either. > > It is possible that my change is wrong just like it is possible that > the controller on the chip is bogus or at least unpredictable based on > existing docs from other chips. I do not have an RM9200 at hand to test > at the moment and a few tests run on a more recent 9G20 indicate that > this code path cannot be used there to test the code on a 3rd platform. > > Since the MSC313E works fine in the single-descriptor mode, and that > people using the old RM9200 very likely favor stability over performance, > better revert this patch until we can test it on the original platform > this part of the driver was written for. Note that the reverted patch > was actually tested on MSC313E. > > Cc: Nicolas Ferre <nicolas.ferre@microchip.com> > Cc: Claudiu Beznea <claudiu.beznea@microchip.com> > Cc: Daniel Palmer <daniel@0x0f.com> > Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> > Link: https://lore.kernel.org/netdev/20201206092041.GA10646@1wt.eu/ > Signed-off-by: Willy Tarreau <w@1wt.eu> Applied, thanks.
Hi Willy, I've been messing with the SSD202D (sibling of the MSC313E) recently and the ethernet performance was awful. I remembered this revert and reverted it and it makes the ethernet work pretty well. So I would like to find some way of making this patch work and I did some digging.. On Thu, 10 Dec 2020 at 03:47, Willy Tarreau <w@1wt.eu> wrote: > > This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb. > > The code was developed and tested on an MSC313E SoC, which seems to be > half-way between the AT91RM9200 and the AT91SAM9260 in that it supports > both the 2-descriptors mode and a Tx ring. The MSC313E and later seem to have a hidden "TX ring" with 4 descriptors. It looks like instead of implementing a ring in memory like the RX side has and all other macb variations they decided to put a 4 entry FIFO behind the TAR/TCR registers. You can load TAR/TCR normally so it is backwards compatible to the at91rm9200 but you can fill it with multiple frames to transmit without waiting for TX to end. There are some extra bits in TSR that seem to indicate how many frames are still waiting to be transmitted and the way BNQ works is a little different. > It turns out that after the code was merged I could notice that the > controller would sometimes lock up, and only when dealing with sustained > bidirectional transfers, in which case it would report a Tx overrun > condition right after having reported being ready, and will stop sending > even after the status is cleared (a down/up cycle fixes it though). I can reproduce this with iperf3's bidirectional mode without fail. I hacked up the driver a bit so that the TX path sends each frame 6 times recording the TSR register before and after to see what is happening: # udhcpc udhcpc: started, v1.31.1 [ 12.944302] Doing phy power up [ 13.147460] macb 1f2a2000.emac eth0: PHY [1f2a2000.emac-ffffffff:00] driver [msc313e phy] (irq=POLL) [ 13.156655] macb 1f2a2000.emac eth0: configuring for phy/mii link mode udhcpc: sending discover [ 15.205691] macb 1f2a2000.emac eth0: Link is Up - 100Mbps/Full - flow control off [ 15.213358] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 15.245235] pre0 41001fb8, post0 30001fb0 -> IDLETSR clear [ 15.249291] pre1 30001f90, post1 20001d90 -> FIFOIDLE1 clear [ 15.253331] pre2 20001d90, post2 10001990 -> FIFOIDLE2 clear [ 15.257385] pre3 10001990, post3 00001190 -> FIFOIDLE3 clear [ 15.261435] pre4 00001190, post4 f0000191 -> FIFOIDLE4 clear, OVR set [ 15.265485] pre5 f0000190, post5 f0000191 -> OVR set [ 15.269535] pre6 f0000190, post6 e0000181 -> OVR set, BNQ clear There seems to be a FIFO empty counter in the top of the register but this is totally undocumented. There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no idea what they mean. Bits 9 through 12 are some new status bits that seem to show if a FIFO slot is inuse. I can't find any mention of these bits anywhere except the header of the vendor driver so I think these are specific to MStar's macb. The interesting part though is that BNQ does not get cleared until multiple frames have been pushed in and after OVR is set. I think this is what breaks your code when it runs on the MSC313E version of MACB. Anyhow. I'm working on a version of your patch that should work with both the at91rm9200 and the MSC313E. Thanks, Daniel
Hi Daniel, On Tue, Apr 06, 2021 at 07:04:58PM +0900, Daniel Palmer wrote: > Hi Willy, > > I've been messing with the SSD202D (sibling of the MSC313E) recently > and the ethernet performance was awful. > I remembered this revert and reverted it and it makes the ethernet > work pretty well. OK, that's reassuring, at least they use the same IP blocks. > [ 15.213358] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > [ 15.245235] pre0 41001fb8, post0 30001fb0 -> IDLETSR clear > [ 15.249291] pre1 30001f90, post1 20001d90 -> FIFOIDLE1 clear > [ 15.253331] pre2 20001d90, post2 10001990 -> FIFOIDLE2 clear > [ 15.257385] pre3 10001990, post3 00001190 -> FIFOIDLE3 clear > [ 15.261435] pre4 00001190, post4 f0000191 -> FIFOIDLE4 clear, OVR set > [ 15.265485] pre5 f0000190, post5 f0000191 -> OVR set > [ 15.269535] pre6 f0000190, post6 e0000181 -> OVR set, BNQ clear > > There seems to be a FIFO empty counter in the top of the register but > this is totally undocumented. Yes that's extremely visible. I did notice some apparently random values at the top of the register as well without being able to determine what they were for, because I was pretty much convinced I was facing a 2-deep queue only. You had a good idea to force it to duplicate packets :-) > There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no > idea what they mean. Maybe they're related to tx queue empty / tx queue full. Just guessing. Since all these bits tend not to be reset until written to, I got confused a few times trying to understand what they indicated. > Bits 9 through 12 are some new status bits that seem to show if a FIFO > slot is inuse. Maybe indeed. > I can't find any mention of these bits anywhere except the header of > the vendor driver so I think these are specific to MStar's macb. > The interesting part though is that BNQ does not get cleared until > multiple frames have been pushed in and after OVR is set. > I think this is what breaks your code when it runs on the MSC313E > version of MACB. Yes that's very possible, because the behavior was inconsistent with what was documented for older chips. Also BNQ changed its meaning on more recent chips, so it may very well have yet another one here. > Anyhow. I'm working on a version of your patch that should work with > both the at91rm9200 and the MSC313E. Cool! Thanks for letting me know. If you need me to run some test, let me know (just don't expect too short latency these days but I definitely will test). Cheers, Willy
Hi, On 07/04/2021 10:42:07+0200, Willy Tarreau wrote: > Hi Daniel, > > On Tue, Apr 06, 2021 at 07:04:58PM +0900, Daniel Palmer wrote: > > Hi Willy, > > > > I've been messing with the SSD202D (sibling of the MSC313E) recently > > and the ethernet performance was awful. > > I remembered this revert and reverted it and it makes the ethernet > > work pretty well. > > OK, that's reassuring, at least they use the same IP blocks. > > > [ 15.213358] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > > [ 15.245235] pre0 41001fb8, post0 30001fb0 -> IDLETSR clear > > [ 15.249291] pre1 30001f90, post1 20001d90 -> FIFOIDLE1 clear > > [ 15.253331] pre2 20001d90, post2 10001990 -> FIFOIDLE2 clear > > [ 15.257385] pre3 10001990, post3 00001190 -> FIFOIDLE3 clear > > [ 15.261435] pre4 00001190, post4 f0000191 -> FIFOIDLE4 clear, OVR set > > [ 15.265485] pre5 f0000190, post5 f0000191 -> OVR set > > [ 15.269535] pre6 f0000190, post6 e0000181 -> OVR set, BNQ clear > > > > There seems to be a FIFO empty counter in the top of the register but > > this is totally undocumented. > > Yes that's extremely visible. I did notice some apparently random > values at the top of the register as well without being able to > determine what they were for, because I was pretty much convinced I > was facing a 2-deep queue only. You had a good idea to force it to > duplicate packets :-) > > > There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no > > idea what they mean. > > Maybe they're related to tx queue empty / tx queue full. Just guessing. > Since all these bits tend not to be reset until written to, I got confused > a few times trying to understand what they indicated. > > > Bits 9 through 12 are some new status bits that seem to show if a FIFO > > slot is inuse. > > Maybe indeed. > > > I can't find any mention of these bits anywhere except the header of > > the vendor driver so I think these are specific to MStar's macb. > > The interesting part though is that BNQ does not get cleared until > > multiple frames have been pushed in and after OVR is set. > > I think this is what breaks your code when it runs on the MSC313E > > version of MACB. > > Yes that's very possible, because the behavior was inconsistent with > what was documented for older chips. Also BNQ changed its meaning on > more recent chips, so it may very well have yet another one here. > > > Anyhow. I'm working on a version of your patch that should work with > > both the at91rm9200 and the MSC313E. > > Cool! Thanks for letting me know. If you need me to run some test, let > me know (just don't expect too short latency these days but I definitely > will test). > Note that I have my rm9200ek home now and I could also run some tests. (I do hope it still boots a recent kernel).
Hi Willy On Wed, 7 Apr 2021 at 17:42, Willy Tarreau <w@1wt.eu> wrote: > > There are two new status bits TBNQ and FBNQ at bits 7 and 8. I have no > > idea what they mean. > > Maybe they're related to tx queue empty / tx queue full. Just guessing. > Since all these bits tend not to be reset until written to, I got confused > a few times trying to understand what they indicated. In the vendor driver they have some weird logic that counts idle, bnq, and these 6 new bits, and if the count of set bits is bigger than 5 it's apparently OK to setup another frame to send. I guess it's some really optimized form of checking for the right flags but it's over my head. :D You have to check it's OK to send right before setting the frame address/length otherwise you will get an overflow eventually. So I think some of these bits are related to the state machine that is handling popping frames out of the FIFO and the bits might change between getting a TX complete interrupt and trying to queue another frame. i.e. you can have a situation where there seems to be a free slot in the irq handler but it's not actually safe to queue it when xmit is called, and if you do an overflow happens and the TX locks up. > > Anyhow. I'm working on a version of your patch that should work with > > both the at91rm9200 and the MSC313E. > > Cool! Thanks for letting me know. If you need me to run some test, let > me know (just don't expect too short latency these days but I definitely > will test). I got this working really well last night. 10+ hours with the iperf3 bidir test and it's still going. Before TX locked up within a few seconds. Once I've cleaned it up a bit more I'll push it to my mstar tree. I don't think it can be mainlined yet as it's not usable without a bunch of other bits. Cheers, Daniel
diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index 85be045..9fbdf53 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -1216,8 +1216,6 @@ struct macb { /* AT91RM9200 transmit queue (1 on wire + 1 queued) */ struct macb_tx_skb rm9200_txq[2]; - unsigned int rm9200_tx_tail; - unsigned int rm9200_tx_len; unsigned int max_tx_length; u64 ethtool_stats[GEM_STATS_LEN + QUEUE_STATS_LEN * MACB_MAX_QUEUES]; diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 817c7b0..352ae7f 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -3936,7 +3936,6 @@ static int at91ether_start(struct macb *lp) MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE) | MACB_BIT(TCOMP) | - MACB_BIT(RM9200_TBRE) | MACB_BIT(ISR_ROVR) | MACB_BIT(HRESP)); @@ -3953,7 +3952,6 @@ static void at91ether_stop(struct macb *lp) MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE) | MACB_BIT(TCOMP) | - MACB_BIT(RM9200_TBRE) | MACB_BIT(ISR_ROVR) | MACB_BIT(HRESP)); @@ -4023,10 +4021,11 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb, struct net_device *dev) { struct macb *lp = netdev_priv(dev); - unsigned long flags; - if (lp->rm9200_tx_len < 2) { - int desc = lp->rm9200_tx_tail; + if (macb_readl(lp, TSR) & MACB_BIT(RM9200_BNQ)) { + int desc = 0; + + netif_stop_queue(dev); /* Store packet information (to free when Tx completed) */ lp->rm9200_txq[desc].skb = skb; @@ -4040,15 +4039,6 @@ static netdev_tx_t at91ether_start_xmit(struct sk_buff *skb, return NETDEV_TX_OK; } - spin_lock_irqsave(&lp->lock, flags); - - lp->rm9200_tx_tail = (desc + 1) & 1; - lp->rm9200_tx_len++; - if (lp->rm9200_tx_len > 1) - netif_stop_queue(dev); - - spin_unlock_irqrestore(&lp->lock, flags); - /* Set address of the data in the Transmit Address register */ macb_writel(lp, TAR, lp->rm9200_txq[desc].mapping); /* Set length of the packet in the Transmit Control register */ @@ -4114,8 +4104,6 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id) struct macb *lp = netdev_priv(dev); u32 intstatus, ctl; unsigned int desc; - unsigned int qlen; - u32 tsr; /* MAC Interrupt Status register indicates what interrupts are pending. * It is automatically cleared once read. @@ -4127,39 +4115,21 @@ static irqreturn_t at91ether_interrupt(int irq, void *dev_id) at91ether_rx(dev); /* Transmit complete */ - if (intstatus & (MACB_BIT(TCOMP) | MACB_BIT(RM9200_TBRE))) { + if (intstatus & MACB_BIT(TCOMP)) { /* The TCOM bit is set even if the transmission failed */ if (intstatus & (MACB_BIT(ISR_TUND) | MACB_BIT(ISR_RLE))) dev->stats.tx_errors++; - spin_lock(&lp->lock); - - tsr = macb_readl(lp, TSR); - - /* we have three possibilities here: - * - all pending packets transmitted (TGO, implies BNQ) - * - only first packet transmitted (!TGO && BNQ) - * - two frames pending (!TGO && !BNQ) - * Note that TGO ("transmit go") is called "IDLE" on RM9200. - */ - qlen = (tsr & MACB_BIT(TGO)) ? 0 : - (tsr & MACB_BIT(RM9200_BNQ)) ? 1 : 2; - - while (lp->rm9200_tx_len > qlen) { - desc = (lp->rm9200_tx_tail - lp->rm9200_tx_len) & 1; + desc = 0; + if (lp->rm9200_txq[desc].skb) { dev_consume_skb_irq(lp->rm9200_txq[desc].skb); lp->rm9200_txq[desc].skb = NULL; dma_unmap_single(&lp->pdev->dev, lp->rm9200_txq[desc].mapping, lp->rm9200_txq[desc].size, DMA_TO_DEVICE); dev->stats.tx_packets++; dev->stats.tx_bytes += lp->rm9200_txq[desc].size; - lp->rm9200_tx_len--; } - - if (lp->rm9200_tx_len < 2 && netif_queue_stopped(dev)) - netif_wake_queue(dev); - - spin_unlock(&lp->lock); + netif_wake_queue(dev); } /* Work-around for EMAC Errata section 41.3.1 */
This reverts commit 0a4e9ce17ba77847e5a9f87eed3c0ba46e3f82eb. The code was developed and tested on an MSC313E SoC, which seems to be half-way between the AT91RM9200 and the AT91SAM9260 in that it supports both the 2-descriptors mode and a Tx ring. It turns out that after the code was merged I could notice that the controller would sometimes lock up, and only when dealing with sustained bidirectional transfers, in which case it would report a Tx overrun condition right after having reported being ready, and will stop sending even after the status is cleared (a down/up cycle fixes it though). After adding lots of traces I couldn't spot a sequence pattern allowing to predict that this situation would happen. The chip comes with no documentation and other bits are often reported with no conclusive pattern either. It is possible that my change is wrong just like it is possible that the controller on the chip is bogus or at least unpredictable based on existing docs from other chips. I do not have an RM9200 at hand to test at the moment and a few tests run on a more recent 9G20 indicate that this code path cannot be used there to test the code on a 3rd platform. Since the MSC313E works fine in the single-descriptor mode, and that people using the old RM9200 very likely favor stability over performance, better revert this patch until we can test it on the original platform this part of the driver was written for. Note that the reverted patch was actually tested on MSC313E. Cc: Nicolas Ferre <nicolas.ferre@microchip.com> Cc: Claudiu Beznea <claudiu.beznea@microchip.com> Cc: Daniel Palmer <daniel@0x0f.com> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/netdev/20201206092041.GA10646@1wt.eu/ Signed-off-by: Willy Tarreau <w@1wt.eu> --- drivers/net/ethernet/cadence/macb.h | 2 -- drivers/net/ethernet/cadence/macb_main.c | 46 ++++++-------------------------- 2 files changed, 8 insertions(+), 40 deletions(-)