Message ID | 20141119160108.GJ24819@dev0.local (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Dear Francesco Dolcini, On Wed, 19 Nov 2014 17:01:08 +0100, Francesco Dolcini wrote: > On Tue, Nov 18, 2014 at 03:30:35PM +0100, Thomas Petazzoni wrote: > > It could indeed be related. I have marked the relevant patches in the > > "ARM: mvebu: no I/O coherency on non-SMP and related updates" series as > > to be backported to stable up to v3.8, so when they get accepted, I'll > > take care of backporting them. > > I prepared and tested this small patch to fix the problem on kernel > 3.13.11 and it seems to fix my ethernet packet corruption problem. > Do you think it is fine? There's one missing thing: as of 3.13, the mvebu-mbus driver was directly looking at the DT to see if it had a coherency fabric node, and if that's the case, then it was enabling the per-MBus window bit telling that this window uses HW I/O coherency. I'm not sure it causes some problems in practice, since with your patch all the cache maintenance operations anyway properly re-enabled. But still, I'd suggest to modify the mvebu-mbus driver accordingly. See: np = of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric"); if (np) { mbus->hw_io_coherency = 1; of_node_put(np); } in drivers/bus/mvebu-mbus.c. > Do you think that this bug on I/O cache coherency could also trigger some > sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in > skb_segment() and a kernel panic in put_page(), but I was never able to > reproduce any of them. It's hard to say exactly what could happen with the wrong I/O cache coherency setup. I would expect only the buffers used for DMA to not be updated properly, but I might be wrong. Thomas
On Wed, Nov 19, 2014 at 05:40:07PM +0100, Thomas Petazzoni wrote: > > Do you think that this bug on I/O cache coherency could also trigger some > > sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in > > skb_segment() and a kernel panic in put_page(), but I was never able to > > reproduce any of them. > > It's hard to say exactly what could happen with the wrong I/O cache > coherency setup. I would expect only the buffers used for DMA to not be > updated properly, but I might be wrong. Interestingly I used to experience some random panics under high network loads on the mirabox and I never knew whether they were attributed to the power supply or to cache corruption. But since I have modified the driver and cache management to synchronize caches before the Rx loop, I haven't encountered them anymore. It could be a pure coincidence just like it could also be more or less related, maybe due to the fact that the cache is synchronized much earlier than the data are used and that this changes the access patterns. Just my few cents, Willy
--- a/arch/arm/mach-mvebu/coherency.c +++ b/arch/arm/mach-mvebu/coherency.c @@ -124,6 +124,12 @@ { struct device_node *np; + if (!is_smp()) + { + pr_info("Coherency fabric disabled\n"); + return 0; + } + np = of_find_matching_node(NULL, of_coherency_table); if (np) { struct resource res; @@ -150,6 +156,9 @@ { struct device_node *np; + if (!is_smp()) + return 0; + np = of_find_matching_node(NULL, of_coherency_table); if (np) { bus_register_notifier(&platform_bus_type,