diff mbox series

[v3,2/2] PCI: imx6: limit DBI register length

Message ID 20181120165626.26424-2-stefan@agner.ch (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show
Series [v3,1/2] PCI: imx6: introduce drvdata | expand

Commit Message

Stefan Agner Nov. 20, 2018, 4:56 p.m. UTC
Define the length of the DBI registers. This makes sure that
the kernel does not access registers beyond that point, avoiding
the following abort on a i.MX 6Quad:
  # cat /sys/devices/soc0/soc/1ffc000.pcie/pci0000\:00/0000\:00\:00.0/config
  [  100.021433] Unhandled fault: imprecise external abort (0x1406) at 0xb6ea7000
  ...
  [  100.056423] PC is at dw_pcie_read+0x50/0x84
  [  100.060790] LR is at dw_pcie_rd_own_conf+0x44/0x48
  ...

Signed-off-by: Stefan Agner <stefan@agner.ch>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
---
Changes in v3:
- Rebase on pci/dwc

 drivers/pci/controller/dwc/pci-imx6.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Leonard Crestez Nov. 20, 2018, 6:19 p.m. UTC | #1
On Tue, 2018-11-20 at 17:56 +0100, Stefan Agner wrote:
> Define the length of the DBI registers. This makes sure that
> the kernel does not access registers beyond that point, avoiding
> the following abort on a i.MX 6Quad:
>   # cat
> /sys/devices/soc0/soc/1ffc000.pcie/pci0000\:00/0000\:00\:00.0/config
>   [  100.021433] Unhandled fault: imprecise external abort (0x1406)
> at 0xb6ea7000
>   ...
>   [  100.056423] PC is at dw_pcie_read+0x50/0x84
>   [  100.060790] LR is at dw_pcie_rd_own_conf+0x44/0x48

I don't know exactly where this limitation comes from, I can indeed
reproduce a stack dump when dumping pci config from /sys/

Unfortunately this seems to block access to registers used for
functionality like interrupts. For example dw_handle_msi_irq does:

	dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_STATUS +
			    (i * MSI_REG_CTRL_BLOCK_SIZE),
			    4, &val);

where PCI_MSI_INTR0_STATUS is 0x830. There are more accesses like this.

Testing on 6dl-sabreauto (dts change required) with an ath9k pcie card
with your series I sometimes get "irq 295: nobody cared" on boot. Maybe
I'm missing something?

--
Regards,
Leonard
Trent Piepho Nov. 20, 2018, 7:13 p.m. UTC | #2
On Tue, 2018-11-20 at 18:19 +0000, Leonard Crestez wrote:
> On Tue, 2018-11-20 at 17:56 +0100, Stefan Agner wrote:
> > Define the length of the DBI registers. This makes sure that
> > the kernel does not access registers beyond that point, avoiding
> > the following abort on a i.MX 6Quad:
> >   # cat
> > /sys/devices/soc0/soc/1ffc000.pcie/pci0000\:00/0000\:00\:00.0/config
> >   [  100.021433] Unhandled fault: imprecise external abort (0x1406)
> > at 0xb6ea7000
> >   ...
> >   [  100.056423] PC is at dw_pcie_read+0x50/0x84
> >   [  100.060790] LR is at dw_pcie_rd_own_conf+0x44/0x48
> 
> I don't know exactly where this limitation comes from, I can indeed
> reproduce a stack dump when dumping pci config from /sys/
> 
> Unfortunately this seems to block access to registers used for
> functionality like interrupts. For example dw_handle_msi_irq does:
> 
> 	dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_STATUS +
> 			    (i * MSI_REG_CTRL_BLOCK_SIZE),
> 			    4, &val);
> 
> where PCI_MSI_INTR0_STATUS is 0x830. There are more accesses like this.
> 
> Testing on 6dl-sabreauto (dts change required) with an ath9k pcie card
> with your series I sometimes get "irq 295: nobody cared" on boot. Maybe
> I'm missing something?

On IMX7d, there are significant blocks of 00s in the config space, and
all 0xff at 0xb50 on up.

I.e., significant portions are empty, in the middle of the config
space, not just at the end.

But they can be read without problem.

Perhaps imx6q aborts on a read of an unimplemented address instead of
returning zeros like imx7d.  In that case it really needs something
more complex to prevent abort than just a length.

It also seems to me that this doesn't need to be in the internal pci
config access functions.  The driver shouldn't be reading registers
that don't exist anyway.  It's really about trying to fix sysfs access
to registers that don't exist.  So maybe it should be done there.
Stefan Agner Nov. 20, 2018, 8:42 p.m. UTC | #3
On 20.11.2018 20:13, Trent Piepho wrote:
> On Tue, 2018-11-20 at 18:19 +0000, Leonard Crestez wrote:
>> On Tue, 2018-11-20 at 17:56 +0100, Stefan Agner wrote:
>> > Define the length of the DBI registers. This makes sure that
>> > the kernel does not access registers beyond that point, avoiding
>> > the following abort on a i.MX 6Quad:
>> >   # cat
>> > /sys/devices/soc0/soc/1ffc000.pcie/pci0000\:00/0000\:00\:00.0/config
>> >   [  100.021433] Unhandled fault: imprecise external abort (0x1406)
>> > at 0xb6ea7000
>> >   ...
>> >   [  100.056423] PC is at dw_pcie_read+0x50/0x84
>> >   [  100.060790] LR is at dw_pcie_rd_own_conf+0x44/0x48
>>
>> I don't know exactly where this limitation comes from, I can indeed
>> reproduce a stack dump when dumping pci config from /sys/
>>
>> Unfortunately this seems to block access to registers used for
>> functionality like interrupts. For example dw_handle_msi_irq does:
>>
>> 	dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_STATUS +
>> 			    (i * MSI_REG_CTRL_BLOCK_SIZE),
>> 			    4, &val);
>>
>> where PCI_MSI_INTR0_STATUS is 0x830. There are more accesses like this.
>>
>> Testing on 6dl-sabreauto (dts change required) with an ath9k pcie card
>> with your series I sometimes get "irq 295: nobody cared" on boot. Maybe
>> I'm missing something?
> 
> On IMX7d, there are significant blocks of 00s in the config space, and
> all 0xff at 0xb50 on up.
> 
> I.e., significant portions are empty, in the middle of the config
> space, not just at the end.
> 
> But they can be read without problem.
> 
> Perhaps imx6q aborts on a read of an unimplemented address instead of
> returning zeros like imx7d.  In that case it really needs something
> more complex to prevent abort than just a length.

Yeah it seems those SoCs behave differently.

Describing a register set with holes will get complicated, I guess it
would ask for a regmap...

> 
> It also seems to me that this doesn't need to be in the internal pci
> config access functions.  The driver shouldn't be reading registers
> that don't exist anyway.  It's really about trying to fix sysfs access
> to registers that don't exist.  So maybe it should be done there.

That was my first approach, see:
https://lkml.org/lkml/2018/11/14/716

--
Stefan
Trent Piepho Nov. 20, 2018, 9:28 p.m. UTC | #4
On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
> On 20.11.2018 20:13, Trent Piepho wrote:
> > 
> > On IMX7d, there are significant blocks of 00s in the config space, and
> > all 0xff at 0xb50 on up.
> > 
> > I.e., significant portions are empty, in the middle of the config
> > space, not just at the end.
> > 
> > But they can be read without problem.
> > 
> > Perhaps imx6q aborts on a read of an unimplemented address instead of
> > returning zeros like imx7d.  In that case it really needs something
> > more complex to prevent abort than just a length.
> 
> Yeah it seems those SoCs behave differently.
> 
> Describing a register set with holes will get complicated, I guess it
> would ask for a regmap...
> 
> > 
> > It also seems to me that this doesn't need to be in the internal pci
> > config access functions.  The driver shouldn't be reading registers
> > that don't exist anyway.  It's really about trying to fix sysfs access
> > to registers that don't exist.  So maybe it should be done there.
> 
> That was my first approach, see:

https://lkml.org/lkml/2018/11/14/716

Yes, but that just used the pci device id which applies to every IMX
design.

It's also not totally correct, as it seems real registers after 0x200
do work on imx6, and that would prevent access to them.

Like you say, it could use a regmap.  Seems kinda overkill to me
though.

I wonder if regmap based caching of register to avoid RMW cycles would
be generally useful?  I know the enable and mask registers are/were
cached in the driver (irq_state[]).
Leonard Crestez Nov. 21, 2018, 1:47 p.m. UTC | #5
On 11/20/2018 11:28 PM, Trent Piepho wrote:
> On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
>> On 20.11.2018 20:13, Trent Piepho wrote:

>>> It also seems to me that this doesn't need to be in the internal pci
>>> config access functions.  The driver shouldn't be reading registers
>>> that don't exist anyway.  It's really about trying to fix sysfs access
>>> to registers that don't exist.  So maybe it should be done there.
>>
>> That was my first approach, see:
> 
> Yes, but that just used the pci device id which applies to every IMX
> design.
> 
> It's also not totally correct, as it seems real registers after 0x200
> do work on imx6, and that would prevent access to them.

I see that Lorenzo already accepted the patch in pci/dwc:

https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/dwc&id=f14eaec153aaebbe940ddd21e4198cc2abc927c2

My tests show that this series breaks pci cards on 6qdl and I think it 
should be reverted until a fix is found. Are you OK with this?

Fixing might require an entirely different approach.
Lorenzo Pieralisi Nov. 21, 2018, 2:17 p.m. UTC | #6
On Wed, Nov 21, 2018 at 01:47:05PM +0000, Leonard Crestez wrote:
> On 11/20/2018 11:28 PM, Trent Piepho wrote:
> > On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
> >> On 20.11.2018 20:13, Trent Piepho wrote:
> 
> >>> It also seems to me that this doesn't need to be in the internal pci
> >>> config access functions.  The driver shouldn't be reading registers
> >>> that don't exist anyway.  It's really about trying to fix sysfs access
> >>> to registers that don't exist.  So maybe it should be done there.
> >>
> >> That was my first approach, see:
> > 
> > Yes, but that just used the pci device id which applies to every IMX
> > design.
> > 
> > It's also not totally correct, as it seems real registers after 0x200
> > do work on imx6, and that would prevent access to them.
> 
> I see that Lorenzo already accepted the patch in pci/dwc:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/dwc&id=f14eaec153aaebbe940ddd21e4198cc2abc927c2
> 
> My tests show that this series breaks pci cards on 6qdl and I think it 
> should be reverted until a fix is found. Are you OK with this?

I will drop the patches from the PCI queue.

Lorenzo
Leonard Crestez Nov. 26, 2018, 10:16 a.m. UTC | #7
On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
> On 20.11.2018 20:13, Trent Piepho wrote:
> > On Tue, 2018-11-20 at 18:19 +0000, Leonard Crestez wrote:
> > > On Tue, 2018-11-20 at 17:56 +0100, Stefan Agner wrote:
> > > > Define the length of the DBI registers. This makes sure that
> > > > the kernel does not access registers beyond that point, avoiding
> > > > the following abort on a i.MX 6Quad:
> > > >   # cat
> > > > /sys/devices/soc0/soc/1ffc000.pcie/pci0000\:00/0000\:00\:00.0/config
> > > >   [  100.021433] Unhandled fault: imprecise external abort (0x1406)
> > > > at 0xb6ea7000
> > > >   ...
> > > >   [  100.056423] PC is at dw_pcie_read+0x50/0x84
> > > >   [  100.060790] LR is at dw_pcie_rd_own_conf+0x44/0x48
> > > 
> > > I don't know exactly where this limitation comes from, I can indeed
> > > reproduce a stack dump when dumping pci config from /sys/
> > > 
> > > Unfortunately this seems to block access to registers used for
> > > functionality like interrupts. For example dw_handle_msi_irq does:
> > > 
> > > 	dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_STATUS +
> > > 			    (i * MSI_REG_CTRL_BLOCK_SIZE),
> > > 			    4, &val);
> > > 
> > > where PCI_MSI_INTR0_STATUS is 0x830. There are more accesses like this.
> > 
> > On IMX7d, there are significant blocks of 00s in the config space, and
> > all 0xff at 0xb50 on up.
> > 
> > I.e., significant portions are empty, in the middle of the config
> > space, not just at the end.
> > 
> > But they can be read without problem.
> > 
> > Perhaps imx6q aborts on a read of an unimplemented address instead of
> > returning zeros like imx7d.  In that case it really needs something
> > more complex to prevent abort than just a length.
> 
> Yeah it seems those SoCs behave differently.
> 
> Describing a register set with holes will get complicated, I guess it
> would ask for a regmap...

The PortLogic area seems to be always at 0x700-0xA00, there are defines
for it. So this would only require one additional range, no regmap
stuff.

I don't know if making portlogic accessible to userspace is useful. In
theory somebody could read debug DWC registers via /sys/blah/config,
but there are better ways to read HW registers from userspace.

Browsing through the docs I can't find stuff like reads with side-
effects so I can't say that it's harmful either.

> > It also seems to me that this doesn't need to be in the internal pci
> > config access functions.  The driver shouldn't be reading registers
> > that don't exist anyway.  It's really about trying to fix sysfs access
> > to registers that don't exist.  So maybe it should be done there.
> 
> That was my first approach

Doing it on a per-soc basis is better, this seems to affect both 6q and
6qp (those are distinct!) but not others.

The pci config area from userspace is accessed through pci_ops.read =
dw_pcie_rd_conf while internal accesses go through dw_pcie_rd_own_conf.
So moving the dbi_length check up one level would work easily.

What about assigning to pci_dev->cfg_size from probe code?
Unfortunately there doesn't seem to be any straight-forward way to do
that from imx6_pcie or even dw_pcie, but maybe I'm missing something?

--
Regards,
Leonard
Trent Piepho Nov. 26, 2018, 4:34 p.m. UTC | #8
On Mon, 2018-11-26 at 10:16 +0000, Leonard Crestez wrote:
> On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
> > On 20.11.2018 20:13, Trent Piepho wrote:
> > > On Tue, 2018-11-20 at 18:19 +0000, Leonard Crestez wrote:
> > > > On Tue, 2018-11-20 at 17:56 +0100, Stefan Agner wrote:
> > > > > Define the length of the DBI registers. This makes sure that
> > > > > the kernel does not access registers beyond that point, avoiding
> > > > > the following abort on a i.MX 6Quad:
> > > > >   # cat
> > > > > /sys/devices/soc0/soc/1ffc000.pcie/pci0000\:00/0000\:00\:00.0/config
> > > > >   [  100.021433] Unhandled fault: imprecise external abort (0x1406)
> > > > > at 0xb6ea7000
> > > > >   ...
> > > > >   [  100.056423] PC is at dw_pcie_read+0x50/0x84
> > > > >   [  100.060790] LR is at dw_pcie_rd_own_conf+0x44/0x48
> > > > 
> > > > I don't know exactly where this limitation comes from, I can indeed
> > > > reproduce a stack dump when dumping pci config from /sys/
> > > > 
> > > > Unfortunately this seems to block access to registers used for
> > > > functionality like interrupts. For example dw_handle_msi_irq does:
> > > > 
> > > > 	dw_pcie_rd_own_conf(pp, PCIE_MSI_INTR0_STATUS +
> > > > 			    (i * MSI_REG_CTRL_BLOCK_SIZE),
> > > > 			    4, &val);
> > > > 
> > > > where PCI_MSI_INTR0_STATUS is 0x830. There are more accesses like this.
> > > 
> > > On IMX7d, there are significant blocks of 00s in the config space, and
> > > all 0xff at 0xb50 on up.
> > > 
> > > I.e., significant portions are empty, in the middle of the config
> > > space, not just at the end.
> > > 
> > > But they can be read without problem.
> > > 
> > > Perhaps imx6q aborts on a read of an unimplemented address instead of
> > > returning zeros like imx7d.  In that case it really needs something
> > > more complex to prevent abort than just a length.
> > 
> > Yeah it seems those SoCs behave differently.
> > 
> > Describing a register set with holes will get complicated, I guess it
> > would ask for a regmap...
> 
> The PortLogic area seems to be always at 0x700-0xA00, there are defines
> for it. So this would only require one additional range, no regmap
> stuff.
> 
> I don't know if making portlogic accessible to userspace is useful. In
> theory somebody could read debug DWC registers via /sys/blah/config,
> but there are better ways to read HW registers from userspace.
> 
> Browsing through the docs I can't find stuff like reads with side-
> effects so I can't say that it's harmful either.

I doubt those register are used much from userspace, since AFAIK there
are no public docs for them.  I have dumped the config regs from
userspace in imx7d to debug a hang problem, but those registers weren't
useful to me.  Maybe they would have been, no docs...

> > > It also seems to me that this doesn't need to be in the internal pci
> > > config access functions.  The driver shouldn't be reading registers
> > > that don't exist anyway.  It's really about trying to fix sysfs access
> > > to registers that don't exist.  So maybe it should be done there.
> > 
> > That was my first approach
> 
> Doing it on a per-soc basis is better, this seems to affect both 6q and
> 6qp (those are distinct!) but not others.
> 
> The pci config area from userspace is accessed through pci_ops.read =
> dw_pcie_rd_conf while internal accesses go through dw_pcie_rd_own_conf.
> So moving the dbi_length check up one level would work easily.

What about something like:

static int dw_pcie_rd_conf(struct pci_bus *bus, u32 devfn, int where,
                           int size, u32 *val)
{
	struct pcie_port *pp = bus->sysdata;

	if (!dw_pcie_valid_device(pp, bus, PCI_SLOT(devfn))) {
		*val = 0xffffffff;
		return PCIBIOS_DEVICE_NOT_FOUND;
	}

	if (bus->number == pp->root_bus_nr) {
+		if (pp->ops->cfg_valid && !pp->ops_cfg_valid(pp, where, size)) {
+			*val = 0xffffffff;
+			return PCIBIOS_SOMETHING_GOES_HERE;
+		}
+
                return dw_pcie_rd_own_conf(pp, where, size, val);
+	}
	return dw_pcie_rd_other_conf(pp, bus, devfn, where, size, val);
}

imx6q and imx6qp can provide cfg_valid that checks their range.
Stefan Agner Nov. 28, 2018, 12:19 p.m. UTC | #9
On 21.11.2018 14:47, Leonard Crestez wrote:
> On 11/20/2018 11:28 PM, Trent Piepho wrote:
>> On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
>>> On 20.11.2018 20:13, Trent Piepho wrote:
> 
>>>> It also seems to me that this doesn't need to be in the internal pci
>>>> config access functions.  The driver shouldn't be reading registers
>>>> that don't exist anyway.  It's really about trying to fix sysfs access
>>>> to registers that don't exist.  So maybe it should be done there.
>>>
>>> That was my first approach, see:
>>
>> Yes, but that just used the pci device id which applies to every IMX
>> design.
>>
>> It's also not totally correct, as it seems real registers after 0x200
>> do work on imx6, and that would prevent access to them.
> 
> I see that Lorenzo already accepted the patch in pci/dwc:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/dwc&id=f14eaec153aaebbe940ddd21e4198cc2abc927c2
> 
> My tests show that this series breaks pci cards on 6qdl and I think it 
> should be reverted until a fix is found. Are you OK with this?
> 
> Fixing might require an entirely different approach.

I tried to reproduce this issue on Apalis iMX6 (i.MX 6Q) with a ath9k
PCIe WiFi card, the issue you are seeing did not happen. My lspci looks
as follows:

root@ea210c63d739:/# lspci -v
00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
[Normal decode])
        Flags: bus master, fast devsel, latency 0, IRQ 255
        Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        Memory behind bridge: 01100000-011fffff
        [virtual] Expansion ROM at 01200000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
        Capabilities: [70] Express Root Port (Slot-), MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
lspci: Unable to load libkmod resources: error -12

01:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network
Adapter (PCI-Express) (rev 01)
        Subsystem: Foxconn International, Inc. Device e007
        Flags: bus master, fast devsel, latency 0, IRQ 312
        Memory at 01100000 (64-bit, non-prefetchable) [size=64K]
        Capabilities: [40] Power Management version 2
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
        Capabilities: [60] Express Legacy Endpoint, MSI 00
        Capabilities: [90] MSI-X: Enable- Count=1 Masked-
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: ath9k


I did also setup a WiFi network and transmitted some packages, but I did
not get a nobody carred message. Do you have an idea why that might be?

# cat /proc/interrupts
...
312:      10967          0          0          0       GPC 123 Level    
ath9k
...


Your conclusion in this thread seem reasonable, hence reverting the
issue does. However, I still would like to reproduce the issue so I can
make sure that future patches don't break it :-)

--
Stefan
Stefan Agner Nov. 28, 2018, 5:36 p.m. UTC | #10
On 28.11.2018 13:19, Stefan Agner wrote:
> On 21.11.2018 14:47, Leonard Crestez wrote:
>> On 11/20/2018 11:28 PM, Trent Piepho wrote:
>>> On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
>>>> On 20.11.2018 20:13, Trent Piepho wrote:
>>
>>>>> It also seems to me that this doesn't need to be in the internal pci
>>>>> config access functions.  The driver shouldn't be reading registers
>>>>> that don't exist anyway.  It's really about trying to fix sysfs access
>>>>> to registers that don't exist.  So maybe it should be done there.
>>>>
>>>> That was my first approach, see:
>>>
>>> Yes, but that just used the pci device id which applies to every IMX
>>> design.
>>>
>>> It's also not totally correct, as it seems real registers after 0x200
>>> do work on imx6, and that would prevent access to them.
>>
>> I see that Lorenzo already accepted the patch in pci/dwc:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/dwc&id=f14eaec153aaebbe940ddd21e4198cc2abc927c2
>>
>> My tests show that this series breaks pci cards on 6qdl and I think it
>> should be reverted until a fix is found. Are you OK with this?
>>
>> Fixing might require an entirely different approach.
> 
> I tried to reproduce this issue on Apalis iMX6 (i.MX 6Q) with a ath9k
> PCIe WiFi card, the issue you are seeing did not happen. My lspci looks
> as follows:
> 
> root@ea210c63d739:/# lspci -v
> 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> [Normal decode])
>         Flags: bus master, fast devsel, latency 0, IRQ 255
>         Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
>         Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
>         Memory behind bridge: 01100000-011fffff
>         [virtual] Expansion ROM at 01200000 [disabled] [size=64K]
>         Capabilities: [40] Power Management version 3
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>         Capabilities: [70] Express Root Port (Slot-), MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Virtual Channel
> lspci: Unable to load libkmod resources: error -12
> 
> 01:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network
> Adapter (PCI-Express) (rev 01)
>         Subsystem: Foxconn International, Inc. Device e007
>         Flags: bus master, fast devsel, latency 0, IRQ 312
>         Memory at 01100000 (64-bit, non-prefetchable) [size=64K]
>         Capabilities: [40] Power Management version 2
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
>         Capabilities: [60] Express Legacy Endpoint, MSI 00
>         Capabilities: [90] MSI-X: Enable- Count=1 Masked-
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Virtual Channel
>         Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
>         Kernel driver in use: ath9k
> 
> 
> I did also setup a WiFi network and transmitted some packages, but I did
> not get a nobody carred message. Do you have an idea why that might be?
> 
> # cat /proc/interrupts
> ...
> 312:      10967          0          0          0       GPC 123 Level    
> ath9k
> ...
> 
> 
> Your conclusion in this thread seem reasonable, hence reverting the
> issue does. However, I still would like to reproduce the issue so I can
> make sure that future patches don't break it :-)

Hm, I realized that I need to enable CONFIG_PCIEPORTBUS and set
ath9k.use_msi=1 to get MSI for that card. However, it seems that ath9k
does not behave well in that setup. It does get interrupts, and seems to
work to some degree, but I was not successful in transmitting data over
WiFi, but that might be an entirly different thing.

However, what I noticed is that when CONFIG_PCIEPORTBUS and
CONFIG_PCI_MSI is enabled, MSI works but legacy interrupt seem not to
fire anymore. That is true for ath9k as well as e1000e (using
e1000e.IntMode=0 to force legacy). Is that a known issue/limitation with
i.MX 6 PCIe?

--
Stefan
Lucas Stach Nov. 28, 2018, 5:50 p.m. UTC | #11
Am Mittwoch, den 28.11.2018, 18:36 +0100 schrieb Stefan Agner:
> On 28.11.2018 13:19, Stefan Agner wrote:
> > On 21.11.2018 14:47, Leonard Crestez wrote:
> > > On 11/20/2018 11:28 PM, Trent Piepho wrote:
> > > > On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
> > > > > On 20.11.2018 20:13, Trent Piepho wrote:
> > > > > > It also seems to me that this doesn't need to be in the internal pci
> > > > > > config access functions.  The driver shouldn't be reading registers
> > > > > > that don't exist anyway.  It's really about trying to fix sysfs access
> > > > > > to registers that don't exist.  So maybe it should be done there.
> > > > > 
> > > > > That was my first approach, see:
> > > > 
> > > > Yes, but that just used the pci device id which applies to every IMX
> > > > design.
> > > > 
> > > > It's also not totally correct, as it seems real registers after 0x200
> > > > do work on imx6, and that would prevent access to them.
> > > 
> > > I see that Lorenzo already accepted the patch in pci/dwc:
> > > 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/dwc&id=f14eaec153aaebbe940ddd21e4198cc2abc927c2
> > > 
> > > My tests show that this series breaks pci cards on 6qdl and I think it
> > > should be reverted until a fix is found. Are you OK with this?
> > > 
> > > Fixing might require an entirely different approach.
> > 
> > I tried to reproduce this issue on Apalis iMX6 (i.MX 6Q) with a ath9k
> > PCIe WiFi card, the issue you are seeing did not happen. My lspci looks
> > as follows:
> > 
> > root@ea210c63d739:/# lspci -v
> > 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
> > [Normal decode])
> >         Flags: bus master, fast devsel, latency 0, IRQ 255
> >         Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
> >         Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
> >         Memory behind bridge: 01100000-011fffff
> >         [virtual] Expansion ROM at 01200000 [disabled] [size=64K]
> >         Capabilities: [40] Power Management version 3
> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
> >         Capabilities: [70] Express Root Port (Slot-), MSI 00
> >         Capabilities: [100] Advanced Error Reporting
> >         Capabilities: [140] Virtual Channel
> > lspci: Unable to load libkmod resources: error -12
> > 
> > 01:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network
> > Adapter (PCI-Express) (rev 01)
> >         Subsystem: Foxconn International, Inc. Device e007
> >         Flags: bus master, fast devsel, latency 0, IRQ 312
> >         Memory at 01100000 (64-bit, non-prefetchable) [size=64K]
> >         Capabilities: [40] Power Management version 2
> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
> >         Capabilities: [60] Express Legacy Endpoint, MSI 00
> >         Capabilities: [90] MSI-X: Enable- Count=1 Masked-
> >         Capabilities: [100] Advanced Error Reporting
> >         Capabilities: [140] Virtual Channel
> >         Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
> >         Kernel driver in use: ath9k
> > 
> > 
> > I did also setup a WiFi network and transmitted some packages, but I did
> > not get a nobody carred message. Do you have an idea why that might be?
> > 
> > # cat /proc/interrupts
> > ...
> > 312:      10967          0          0          0       GPC 123 Level    
> > ath9k
> > ...
> > 
> > 
> > Your conclusion in this thread seem reasonable, hence reverting the
> > issue does. However, I still would like to reproduce the issue so I can
> > make sure that future patches don't break it :-)
> 
> Hm, I realized that I need to enable CONFIG_PCIEPORTBUS and set
> ath9k.use_msi=1 to get MSI for that card. However, it seems that ath9k
> does not behave well in that setup. It does get interrupts, and seems to
> work to some degree, but I was not successful in transmitting data over
> WiFi, but that might be an entirly different thing.
> 
> However, what I noticed is that when CONFIG_PCIEPORTBUS and
> CONFIG_PCI_MSI is enabled, MSI works but legacy interrupt seem not to
> fire anymore. That is true for ath9k as well as e1000e (using
> e1000e.IntMode=0 to force legacy). Is that a known issue/limitation with
> i.MX 6 PCIe?

Yes, this is a known issue with the Designware PCIe core, not just on
i.MX6. As soon as any MSI interrupt is enabled, the core doesn't
forward legacy IRQs anymore.

So if any card in your system needs legacy interrupts (and ath9k is
very likely to need this, as MSI support is pretty new and
experimental), you need to boot with "nomsi" set on the kernel command
line.

Regards,
Lucas
Stefan Agner Nov. 28, 2018, 5:56 p.m. UTC | #12
On 28.11.2018 18:50, Lucas Stach wrote:
> Am Mittwoch, den 28.11.2018, 18:36 +0100 schrieb Stefan Agner:
>> On 28.11.2018 13:19, Stefan Agner wrote:
>> > On 21.11.2018 14:47, Leonard Crestez wrote:
>> > > On 11/20/2018 11:28 PM, Trent Piepho wrote:
>> > > > On Tue, 2018-11-20 at 21:42 +0100, Stefan Agner wrote:
>> > > > > On 20.11.2018 20:13, Trent Piepho wrote:
>> > > > > > It also seems to me that this doesn't need to be in the internal pci
>> > > > > > config access functions.  The driver shouldn't be reading registers
>> > > > > > that don't exist anyway.  It's really about trying to fix sysfs access
>> > > > > > to registers that don't exist.  So maybe it should be done there.
>> > > > >
>> > > > > That was my first approach, see:
>> > > >
>> > > > Yes, but that just used the pci device id which applies to every IMX
>> > > > design.
>> > > >
>> > > > It's also not totally correct, as it seems real registers after 0x200
>> > > > do work on imx6, and that would prevent access to them.
>> > >
>> > > I see that Lorenzo already accepted the patch in pci/dwc:
>> > >
>> > > https://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/pci.git/commit/?h=pci/dwc&id=f14eaec153aaebbe940ddd21e4198cc2abc927c2
>> > >
>> > > My tests show that this series breaks pci cards on 6qdl and I think it
>> > > should be reverted until a fix is found. Are you OK with this?
>> > >
>> > > Fixing might require an entirely different approach.
>> >
>> > I tried to reproduce this issue on Apalis iMX6 (i.MX 6Q) with a ath9k
>> > PCIe WiFi card, the issue you are seeing did not happen. My lspci looks
>> > as follows:
>> >
>> > root@ea210c63d739:/# lspci -v
>> > 00:00.0 PCI bridge: Synopsys, Inc. Device abcd (rev 01) (prog-if 00
>> > [Normal decode])
>> >         Flags: bus master, fast devsel, latency 0, IRQ 255
>> >         Memory at 01000000 (32-bit, non-prefetchable) [size=1M]
>> >         Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
>> >         Memory behind bridge: 01100000-011fffff
>> >         [virtual] Expansion ROM at 01200000 [disabled] [size=64K]
>> >         Capabilities: [40] Power Management version 3
>> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>> >         Capabilities: [70] Express Root Port (Slot-), MSI 00
>> >         Capabilities: [100] Advanced Error Reporting
>> >         Capabilities: [140] Virtual Channel
>> > lspci: Unable to load libkmod resources: error -12
>> >
>> > 01:00.0 Network controller: Qualcomm Atheros AR928X Wireless Network
>> > Adapter (PCI-Express) (rev 01)
>> >         Subsystem: Foxconn International, Inc. Device e007
>> >         Flags: bus master, fast devsel, latency 0, IRQ 312
>> >         Memory at 01100000 (64-bit, non-prefetchable) [size=64K]
>> >         Capabilities: [40] Power Management version 2
>> >         Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit-
>> >         Capabilities: [60] Express Legacy Endpoint, MSI 00
>> >         Capabilities: [90] MSI-X: Enable- Count=1 Masked-
>> >         Capabilities: [100] Advanced Error Reporting
>> >         Capabilities: [140] Virtual Channel
>> >         Capabilities: [160] Device Serial Number 00-00-00-00-00-00-00-00
>> >         Kernel driver in use: ath9k
>> >
>> >
>> > I did also setup a WiFi network and transmitted some packages, but I did
>> > not get a nobody carred message. Do you have an idea why that might be?
>> >
>> > # cat /proc/interrupts
>> > ...
>> > 312:      10967          0          0          0       GPC 123 Level    
>> > ath9k
>> > ...
>> >
>> >
>> > Your conclusion in this thread seem reasonable, hence reverting the
>> > issue does. However, I still would like to reproduce the issue so I can
>> > make sure that future patches don't break it :-)
>>
>> Hm, I realized that I need to enable CONFIG_PCIEPORTBUS and set
>> ath9k.use_msi=1 to get MSI for that card. However, it seems that ath9k
>> does not behave well in that setup. It does get interrupts, and seems to
>> work to some degree, but I was not successful in transmitting data over
>> WiFi, but that might be an entirly different thing.
>>
>> However, what I noticed is that when CONFIG_PCIEPORTBUS and
>> CONFIG_PCI_MSI is enabled, MSI works but legacy interrupt seem not to
>> fire anymore. That is true for ath9k as well as e1000e (using
>> e1000e.IntMode=0 to force legacy). Is that a known issue/limitation with
>> i.MX 6 PCIe?
> 
> Yes, this is a known issue with the Designware PCIe core, not just on
> i.MX6. As soon as any MSI interrupt is enabled, the core doesn't
> forward legacy IRQs anymore.

Oh I see, unfortunate!

> 
> So if any card in your system needs legacy interrupts (and ath9k is
> very likely to need this, as MSI support is pretty new and
> experimental), you need to boot with "nomsi" set on the kernel command
> line.

Ok, thanks for clarification.

FWIW, e1000e with MSI works perfectly fine, its just ath9k when forcing
MSI using the kernel parameter which does not work really. I guess that
is the reason it is not enabled by default.

--
Stefan
Leonard Crestez Nov. 28, 2018, 6:01 p.m. UTC | #13
On Wed, 2018-11-28 at 18:36 +0100, Stefan Agner wrote:
> On 28.11.2018 13:19, Stefan Agner wrote:
> > On 21.11.2018 14:47, Leonard Crestez wrote:
> > > My tests show that this series breaks pci cards on 6qdl and I
> > > think it should be reverted until a fix is found. Are you OK with
> > > this?
> > > 
> > > Fixing might require an entirely different approach.
> > 
> > I tried to reproduce this issue on Apalis iMX6 (i.MX 6Q) with a ath9k
> > PCIe WiFi card, the issue you are seeing did not happen. My lspci looks
> > as follows:

I think in order to get "irq: nobody cared" you need to do a soft
reboot after scanning with the card on a functional kernel.

A better way to check for issues is to print when the dbi_length check
fails. I get stack traces like this:

imx6q-pcie 1ffc000.pcie: host bridge /soc/pcie@1ffc000 ranges:
imx6q-pcie 1ffc000.pcie: Parsing ranges property...
imx6q-pcie 1ffc000.pcie:    IO 0x01f80000..0x01f8ffff -> 0x00000000
imx6q-pcie 1ffc000.pcie:   MEM 0x01000000..0x01efffff -> 0x01000000
refuse rd_own_conf where=828
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc1-00012-g864d906b2f62 #3
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[<c01133cc>] (unwind_backtrace) from [<c010da90>] (show_stack+0x10/0x14)
[<c010da90>] (show_stack) from [<c0b30610>] (dump_stack+0x8c/0xac)
[<c0b30610>] (dump_stack) from [<c04f1ff4>] (dw_pcie_rd_own_conf+0x6c/0x78)
[<c04f1ff4>] (dw_pcie_rd_own_conf) from [<c04f2db4>] (dw_pcie_setup_rc+0x54/0x394)
[<c04f2db4>] (dw_pcie_setup_rc) from [<c04f46bc>] (imx6_pcie_host_init+0x80/0x138)
[<c04f46bc>] (imx6_pcie_host_init) from [<c04f2ad8>] (dw_pcie_host_init+0x258/0x4e0)
[<c04f2ad8>] (dw_pcie_host_init) from [<c04f4380>] (imx6_pcie_probe+0x2b8/0x574)
[<c04f4380>] (imx6_pcie_probe) from [<c05d4834>] (platform_drv_probe+0x48/0x98)
[<c05d4834>] (platform_drv_probe) from [<c05d2630>] (really_probe+0x2a8/0x40c)
[<c05d2630>] (really_probe) from [<c05d293c>] (driver_probe_device+0x6c/0x1c0)
[<c05d293c>] (driver_probe_device) from [<c05d2ba0>] (__driver_attach+0x110/0x138)
[<c05d2ba0>] (__driver_attach) from [<c05d04e4>] (bus_for_each_dev+0x70/0xb4)
[<c05d04e4>] (bus_for_each_dev) from [<c05d17f0>] (bus_add_driver+0x1a4/0x264)
[<c05d17f0>] (bus_add_driver) from [<c05d38b4>] (driver_register+0x74/0x108)
[<c05d38b4>] (driver_register) from [<c0102f8c>] (do_one_initcall+0x80/0x26c)
[<c0102f8c>] (do_one_initcall) from [<c11010cc>] (kernel_init_freeable+0x268/0x33c)
[<c11010cc>] (kernel_init_freeable) from [<c0b492e0>] (kernel_init+0x8/0x114)
[<c0b492e0>] (kernel_init) from [<c01010e8>] (ret_from_fork+0x14/0x2c)

If rd/wr_own_conf is incorrectly ignored then behavior is somewhat impredictible.

--
Regards,
Leonard
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c
index cdcf54ff30fb..7015bda22aef 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -43,6 +43,7 @@  enum imx6_pcie_variants {
 
 struct imx6_pcie_drvdata {
 	enum imx6_pcie_variants variant;
+	int			dbi_length;
 };
 
 struct imx6_pcie {
@@ -981,6 +982,8 @@  static int imx6_pcie_probe(struct platform_device *pdev)
 		break;
 	}
 
+	pci->dbi_length = imx6_pcie->drvdata->dbi_length;
+
 	/* Grab turnoff reset */
 	imx6_pcie->turnoff_reset = devm_reset_control_get_optional_exclusive(dev, "turnoff");
 	if (IS_ERR(imx6_pcie->turnoff_reset)) {
@@ -1052,7 +1055,7 @@  static void imx6_pcie_shutdown(struct platform_device *pdev)
 }
 
 static const struct imx6_pcie_drvdata drvdata[] = {
-	[IMX6Q] = { .variant = IMX6Q },
+	[IMX6Q] = { .variant = IMX6Q, .dbi_length = 0x15c },
 	[IMX6SX] = { .variant = IMX6SX },
 	[IMX6QP] = { .variant = IMX6QP },
 	[IMX7D] = { .variant = IMX7D },