diff mbox

Intel I350 mini-PCIe card (igb) on Mirabox (mvebu / Armada 370)

Message ID 20140407174106.GD9952@obsidianresearch.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason Gunthorpe April 7, 2014, 5:41 p.m. UTC
On Sun, Apr 06, 2014 at 10:57:40PM +0100, Neil Greatorex wrote:
> Willy,
> 
> On Sun, 6 Apr 2014, Willy Tarreau wrote:
> 
> >As you can see, the sequence is exactly the same for both ports. The
> >first one has no problem performing the readl(), but the second one
> >cannot. Both of them got the memory address returned by a call to
> >pci_iomap(dev, 0, 0). I could verify that the pci_resource_len() for
> >both is 524288 (0x80000).
> >
> >The last "hwaddr=f0400000" is printed just before calling rd32() and
> >shows that it was still OK there. Since the resource flags are 0x40200,
> >we only have IORESOURCE_MEM so pci_iomap() calls ioremap_nocache().
> >
> 
> Good work - I've been doing similar things myself! I can confirm
> that I see exactly the same thing with similar printks:
> 
> First port:
> [ 1809.452878] igb 0000:01:00.0: enabling bus mastering
> [ 1809.453098] igb 0000:01:00.0 (unregistered net_device): hw_addr
> is f1000000, start=e0000000, len=80000, flags=40200
> [ 1809.453109] igb 0000:01:00.0 (unregistered net_device): About to
> read from offset 18
> [ 1809.453120] igb 0000:01:00.0 (unregistered net_device): Read from
> 18 returned 1400c0
> 
> Second port:
> [ 1809.459445] igb 0000:01:00.1: enabling bus mastering
> [ 1809.459563] igb 0000:01:00.1 (unregistered net_device): hw_addr is
> f1100000, start=e0100000, len=80000, flags=40200
> [ 1809.459573] igb 0000:01:00.1 (unregistered net_device): About to read
> from offset 18
> [ 1809.459581] Unhandled fault: external abort on non-linefetch
> (0x1008) at 0xf1100018
> 
> In the output above, the start= part shows the physical address and
> hw_addr shows the mapped address.

This is very similar to what Matthew Minter
<matthew_minter@xyratex.com> is seeing on Hot Plug with AHCI. (See
'Armada XP (mvebu) PCIe memory (BAR/window) re-allocation' thread)

That probably says it is somehow mbus related - dumping the mbus
registers when the fault happens should clarify that point. The size
would a good place to check first.

> The physical addresses match those given in the lspci -vvv output
> (see https://gist.github.com/ngreatorex/9772195). I don't know
> enough about PCIe, the SoC *or* the Intel card to know if these
> addresses look correct or even sane! I did wonder if there was some
> issue due to the fact that the resources for 01:00.0 and 01:00.1
> overlap, but I would guess(!?) that it's common in hardware that
> presents multiple devices.

Which overlap?

To be very clear, PCI BARs, should never overlap.

The bridge windows should fully contain downstream bars:

00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
	Bus: primary=00, secondary=01, subordinate=02, sec-latency=0
	Memory behind bridge: e0000000-e02fffff
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
	Region 0: Memory at e0000000 (32-bit, non-prefetchable) [disabled] [size=512K]
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
	Region 0: Memory at e0100000 (32-bit, non-prefetchable) [disabled] [size=512K]

Looks good to me.

HOWEVER, looking now very closely:

00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
   Memory behind bridge: e0000000-e02fffff
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
   Memory behind bridge: e0300000-e03fffff

This is certainly wrong, MBUS requires special alignment and sizing.
0x300000 is not a size which is a power of two, and the next window
starts right after.

We need to see the first bridge use e0000000-e03fffff

Just to confirm, what does something like the below say for you guys?



Regards,
Jason

Comments

Neil Greatorex April 7, 2014, 7:41 p.m. UTC | #1
Jason, Thomas,

On Mon, Apr 7, 2014 at 6:41 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
>>
>> First port:
>> [ 1809.452878] igb 0000:01:00.0: enabling bus mastering
>> [ 1809.453098] igb 0000:01:00.0 (unregistered net_device): hw_addr
>> is f1000000, start=e0000000, len=80000, flags=40200
>> [ 1809.453109] igb 0000:01:00.0 (unregistered net_device): About to
>> read from offset 18
>> [ 1809.453120] igb 0000:01:00.0 (unregistered net_device): Read from
>> 18 returned 1400c0
>>
>> Second port:
>> [ 1809.459445] igb 0000:01:00.1: enabling bus mastering
>> [ 1809.459563] igb 0000:01:00.1 (unregistered net_device): hw_addr is
>> f1100000, start=e0100000, len=80000, flags=40200
>> [ 1809.459573] igb 0000:01:00.1 (unregistered net_device): About to read
>> from offset 18
>> [ 1809.459581] Unhandled fault: external abort on non-linefetch
>> (0x1008) at 0xf1100018
>>
>> In the output above, the start= part shows the physical address and
>> hw_addr shows the mapped address.
>
> This is very similar to what Matthew Minter
> <matthew_minter@xyratex.com> is seeing on Hot Plug with AHCI. (See
> 'Armada XP (mvebu) PCIe memory (BAR/window) re-allocation' thread)
>
> That probably says it is somehow mbus related - dumping the mbus
> registers when the fault happens should clarify that point. The size
> would a good place to check first.
>
>> The physical addresses match those given in the lspci -vvv output
>> (see https://gist.github.com/ngreatorex/9772195). I don't know
>> enough about PCIe, the SoC *or* the Intel card to know if these
>> addresses look correct or even sane! I did wonder if there was some
>> issue due to the fact that the resources for 01:00.0 and 01:00.1
>> overlap, but I would guess(!?) that it's common in hardware that
>> presents multiple devices.
>
> Which overlap?
>
> To be very clear, PCI BARs, should never overlap.
>

I realise that overlap was probably the wrong word. I meant that the
resources for 01:00.0 and 01:00.1 are not contiguous but are mixed
together. If you sort by address you get:

e0000000-e007ffff : 0000:01:00.0
e0080000-e00fffff : 0000:01:00.0
e0100000-e017ffff : 0000:01:00.1
e0180000-e01fffff : 0000:01:00.1
e0200000-e0203fff : 0000:01:00.0
e0204000-e0223fff : 0000:01:00.0
e0224000-e0243fff : 0000:01:00.0
e0244000-e0247fff : 0000:01:00.1
e0248000-e0267fff : 0000:01:00.1
e0268000-e0287fff : 0000:01:00.1

> The bridge windows should fully contain downstream bars:
>
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
>         Bus: primary=00, secondary=01, subordinate=02, sec-latency=0
>         Memory behind bridge: e0000000-e02fffff
> 01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
>         Region 0: Memory at e0000000 (32-bit, non-prefetchable) [disabled] [size=512K]
> 01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
>         Region 0: Memory at e0100000 (32-bit, non-prefetchable) [disabled] [size=512K]
>
> Looks good to me.
>
> HOWEVER, looking now very closely:
>
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
>    Memory behind bridge: e0000000-e02fffff
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
>    Memory behind bridge: e0300000-e03fffff
>
> This is certainly wrong, MBUS requires special alignment and sizing.
> 0x300000 is not a size which is a power of two, and the next window
> starts right after.
>

Interesting. Does the PCI code provide a way to specify that the sizes
much be a power of 2? I don't fully understand the implications but
would it be possible to assign just one MBUS window for the whole of
the PCIe memory instead?

> We need to see the first bridge use e0000000-e03fffff
>
> Just to confirm, what does something like the below say for you guys?

See https://gist.github.com/ngreatorex/10025253 for the dmesg output.
I have also included the contents of
/sys/kernel/debug/mvebu-mbus/devices both before and after the
modprobe / oops. As you can see I get a total of 3 WARNINGs - one at
boot for the xHCI controller, and two when inserting igb.ko. Note that
this time I did this with both ports enabled.

Cheers,
Neil
Jason Gunthorpe April 7, 2014, 8:48 p.m. UTC | #2
On Mon, Apr 07, 2014 at 08:41:52PM +0100, Neil Greatorex wrote:
> > To be very clear, PCI BARs, should never overlap.
> 
> I realise that overlap was probably the wrong word. I meant that the
> resources for 01:00.0 and 01:00.1 are not contiguous but are mixed
> together. If you sort by address you get:
> 
> e0000000-e007ffff : 0000:01:00.0
> e0080000-e00fffff : 0000:01:00.0
> e0100000-e017ffff : 0000:01:00.1
> e0180000-e01fffff : 0000:01:00.1
> e0200000-e0203fff : 0000:01:00.0
> e0204000-e0223fff : 0000:01:00.0
> e0224000-e0243fff : 0000:01:00.0
> e0244000-e0247fff : 0000:01:00.1
> e0248000-e0267fff : 0000:01:00.1
> e0268000-e0287fff : 0000:01:00.1

Yes, that is fine. It is just a quirk of the allocator.
They are all within the downstream bridge window. 
 
> > HOWEVER, looking now very closely:
> >
> > 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
> >    Memory behind bridge: e0000000-e02fffff
> > 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01) (prog-if 00 [Normal decode])
> >    Memory behind bridge: e0300000-e03fffff
> >
> > This is certainly wrong, MBUS requires special alignment and sizing.
> > 0x300000 is not a size which is a power of two, and the next window
> > starts right after.

> Interesting. Does the PCI code provide a way to specify that the sizes
> much be a power of 2? I don't fully understand the implications but
> would it be possible to assign just one MBUS window for the whole of
> the PCIe memory instead?

I know this has come up before, but I don't recall where things
settled out.. mvebu_pcie_align_resource is the function to look at,
the size at that point needs to be rounded up to a power of two and
communicated back to the caller.

Alternately, I belive Thomas once discussed using multiple mbus
windows for these non-aligned requests.

> > Just to confirm, what does something like the below say for you guys?
> 
> See https://gist.github.com/ngreatorex/10025253 for the dmesg output.
> I have also included the contents of
> /sys/kernel/debug/mvebu-mbus/devices both before and after the
> modprobe / oops. As you can see I get a total of 3 WARNINGs - one at
> boot for the xHCI controller, and two when inserting igb.ko. Note that
> this time I did this with both ports enabled.

Yep, that is certainly the root problem - most likely for
everyone. This also explains why Will saw success when he reverted
that unrelated patch. That change altered the allocation pattern of
the BARs and it just so happened to make things fall out properly.

We should also fix mvebu_mbus_read_window, so debugfs reports the
actual HW function. Instead of this:
        *size = (ctrlreg | ~WIN_CTRL_SIZE_MASK) + 1;

Something with ffs is required, perhaps (untested):
        *size = 2 << ffs(~(ctrlreg | ~WIN_CTRL_SIZE_MASK));

Thomas: Do you think the WARN_ON should head into mainline?

Jason
diff mbox

Patch

diff --git a/drivers/bus/mvebu-mbus.c b/drivers/bus/mvebu-mbus.c
index 2ac754e..dfeb0dd 100644
--- a/drivers/bus/mvebu-mbus.c
+++ b/drivers/bus/mvebu-mbus.c
@@ -56,6 +56,7 @@ 
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/debugfs.h>
+#include <linux/log2.h>
 
 /*
  * DDR target is the same on all platforms.
@@ -266,6 +267,8 @@  static int mvebu_mbus_setup_window(struct mvebu_mbus_state *mbus,
                mbus->soc->win_cfg_offset(win);
        u32 ctrl, remap_addr;
 
+       WARN_ON(!is_power_of_2(size));
+
        ctrl = ((size - 1) & WIN_CTRL_SIZE_MASK) |
                (attr << WIN_CTRL_ATTR_SHIFT)    |
                (target << WIN_CTRL_TGT_SHIFT)   |