Message ID | 20150321205359.GM8656@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Russell, On Sat, Mar 21, 2015 at 5:53 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > Given that this bug can seriously screw data up in undetectable ways (TCP > checksums don't save you, because the FEC generates them on the data which > it read from memory, even if it happened to read the data from the SoC's > boot ROM) we do need to get this fixed ASAP. Current mainline has 2b995f63987013 reverted, so 4.0-rc5 will not have this corruption problem. Regards, Fabio Estevam
Hi! 22.03.2015, 01:35, "Fabio Estevam" <festevam@gmail.com>: > Hi Russell, > > On Sat, Mar 21, 2015 at 5:53 PM, Russell King - ARM Linux > <linux@arm.linux.org.uk> wrote: >> Given that this bug can seriously screw data up in undetectable ways (TCP >> checksums don't save you, because the FEC generates them on the data which >> it read from memory, even if it happened to read the data from the SoC's >> boot ROM) we do need to get this fixed ASAP. > > Current mainline has 2b995f63987013 reverted, so 4.0-rc5 will not have > this corruption problem. I've tested with current mainline and with mainline+2b995f63987013 commit with Russell's fix and it both works fine, without corruption. -- ??????
23.03.2015, 05:42, "fugang.duan@freescale.com" <fugang.duan@freescale.com>: > From: Fabio Estevam <festevam@gmail.com> Sent: Sunday, March 22, 2015 6:36 AM >> To: Russell King - ARM Linux >> Cc: ????? ??????; Duan Fugang-B38611; netdev@vger.kernel.org; linux-arm- >> kernel >> Subject: Re: Bug in drivers/net/ethernet/freescale/fec_main.c, TX is >> broken. In 4.0.0-rc3 >> >> Hi Russell, >> >> On Sat, Mar 21, 2015 at 5:53 PM, Russell King - ARM Linux >> <linux@arm.linux.org.uk> wrote: >>> Given that this bug can seriously screw data up in undetectable ways >>> (TCP checksums don't save you, because the FEC generates them on the >>> data which it read from memory, even if it happened to read the data >>> from the SoC's boot ROM) we do need to get this fixed ASAP. >> Current mainline has 2b995f63987013 reverted, so 4.0-rc5 will not have >> this corruption problem. >> >> Regards, >> >> Fabio Estevam > > We cannot revert the commit 2b995f63987013, otherwise there introduce other issue. The correct fix method is Russell King's fix in the previous mail. > It is strange thing that I cannot reproduce the issue on i.MX6q sabresd board. Anyway, we must consider TSO case that it's not a fragmented skb. It is just DMA_API_DEBUG=y error versus several data corruption error. DMA_API_DEBUG can be wrong too. And did you do the check with that option enabled? This can cause delays in kernel enough to do actually write to the network before code in commit freed non-send data blocks. I have it disabled all the time. And you can check it by compiling a kernel over NFS, or big git merges over NFS, or doing big ftp transfer, etc. -- ??????
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index f9c0baea12ed..8bb2a811df3e 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -1227,8 +1227,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id) skb = txq->tx_skbuff[index]; bdnum++; } - if (skb_shinfo(skb)->nr_frags && - (status = bdp_t->cbd_sc) & BD_ENET_TX_READY) + if ((status = bdp_t->cbd_sc) & BD_ENET_TX_READY) break; for (i = 0; i < bdnum; i++) {