diff mbox

ARM: mvebu: ethernet packets corruption and I/O coherency

Message ID 20141119160108.GJ24819@dev0.local (mailing list archive)
State New, archived
Headers show

Commit Message

Francesco Dolcini Nov. 19, 2014, 4:01 p.m. UTC
Hi Thomas and all

On Tue, Nov 18, 2014 at 03:30:35PM +0100, Thomas Petazzoni wrote:
> It could indeed be related. I have marked the relevant patches in the 
> "ARM: mvebu: no I/O coherency on non-SMP and related updates" series as
> to be backported to stable up to v3.8, so when they get accepted, I'll
> take care of backporting them.

I prepared and tested this small patch to fix the problem on kernel
3.13.11 and it seems to fix my ethernet packet corruption problem.
Do you think it is fine?

Do you think that this bug on I/O cache coherency could also trigger some
sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in
skb_segment() and a kernel panic in put_page(), but I was never able to
reproduce any of them.

Thanks,
Francesco

Comments

Thomas Petazzoni Nov. 19, 2014, 4:40 p.m. UTC | #1
Dear Francesco Dolcini,

On Wed, 19 Nov 2014 17:01:08 +0100, Francesco Dolcini wrote:

> On Tue, Nov 18, 2014 at 03:30:35PM +0100, Thomas Petazzoni wrote:
> > It could indeed be related. I have marked the relevant patches in the 
> > "ARM: mvebu: no I/O coherency on non-SMP and related updates" series as
> > to be backported to stable up to v3.8, so when they get accepted, I'll
> > take care of backporting them.
> 
> I prepared and tested this small patch to fix the problem on kernel
> 3.13.11 and it seems to fix my ethernet packet corruption problem.
> Do you think it is fine?

There's one missing thing: as of 3.13, the mvebu-mbus driver was
directly looking at the DT to see if it had a coherency fabric node,
and if that's the case, then it was enabling the per-MBus window bit
telling that this window uses HW I/O coherency. I'm not sure it causes
some problems in practice, since with your patch all the cache
maintenance operations anyway properly re-enabled. But still, I'd
suggest to modify the mvebu-mbus driver accordingly. See:

	np = of_find_compatible_node(NULL, NULL, "marvell,coherency-fabric");
	if (np) {
		mbus->hw_io_coherency = 1;
		of_node_put(np);
	}

in drivers/bus/mvebu-mbus.c.

> Do you think that this bug on I/O cache coherency could also trigger some
> sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in
> skb_segment() and a kernel panic in put_page(), but I was never able to
> reproduce any of them.

It's hard to say exactly what could happen with the wrong I/O cache
coherency setup. I would expect only the buffers used for DMA to not be
updated properly, but I might be wrong.

Thomas
Willy Tarreau Nov. 19, 2014, 4:57 p.m. UTC | #2
On Wed, Nov 19, 2014 at 05:40:07PM +0100, Thomas Petazzoni wrote:
> > Do you think that this bug on I/O cache coherency could also trigger some
> > sporadic random OOPS and kernel panic? I got an OOPS with a broken LR in
> > skb_segment() and a kernel panic in put_page(), but I was never able to
> > reproduce any of them.
> 
> It's hard to say exactly what could happen with the wrong I/O cache
> coherency setup. I would expect only the buffers used for DMA to not be
> updated properly, but I might be wrong.

Interestingly I used to experience some random panics under high network
loads on the mirabox and I never knew whether they were attributed to the
power supply or to cache corruption. But since I have modified the driver
and cache management to synchronize caches before the Rx loop, I haven't
encountered them anymore. It could be a pure coincidence just like it
could also be more or less related, maybe due to the fact that the cache
is synchronized much earlier than the data are used and that this changes
the access patterns.

Just my few cents,
Willy
diff mbox

Patch

--- a/arch/arm/mach-mvebu/coherency.c
+++ b/arch/arm/mach-mvebu/coherency.c
@@ -124,6 +124,12 @@ 
 {
 	struct device_node *np;
 
+	if (!is_smp())
+	{
+	    pr_info("Coherency fabric disabled\n");
+	    return 0;
+	}
+
 	np = of_find_matching_node(NULL, of_coherency_table);
 	if (np) {
 		struct resource res;
@@ -150,6 +156,9 @@ 
 {
 	struct device_node *np;
 
+	if (!is_smp())
+	    return 0;
+
 	np = of_find_matching_node(NULL, of_coherency_table);
 	if (np) {
 		bus_register_notifier(&platform_bus_type,