diff mbox

[v2] PCI: mvebu - Support a bridge with no IO port window

Message ID 1381868182-8544-1-git-send-email-jgunthorpe@obsidianresearch.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason Gunthorpe Oct. 15, 2013, 8:16 p.m. UTC
Make pcie-io-aperture and the IO port MBUS ID in ranges optional.
If not provided the bridge reports to Linux that IO space mapping is
not supported and refuses to configure an IO mbus window.

This allows both complete disable (do not specify pcie-io-aperture) and
per-port disable (do not specify a IO target ranges entry for the port)

Most PCIE devices these days do not require IO support to function,
so having an option to disable it in the driver is useful.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/pci/host/pci-mvebu.c | 64 +++++++++++++++++++++++++++++---------------
 1 file changed, 43 insertions(+), 21 deletions(-)

V2 changes:
 - Fixed logic in mvebu_has_ioport [thomas]
 - Included secondary_status in the IO_BASE reply [thomas]
 - Fixed range check for disabled IO windows

Comments

Jason Cooper Oct. 17, 2013, 1:10 p.m. UTC | #1
On Tue, Oct 15, 2013 at 02:16:22PM -0600, Jason Gunthorpe wrote:
> Make pcie-io-aperture and the IO port MBUS ID in ranges optional.
> If not provided the bridge reports to Linux that IO space mapping is
> not supported and refuses to configure an IO mbus window.
> 
> This allows both complete disable (do not specify pcie-io-aperture) and
> per-port disable (do not specify a IO target ranges entry for the port)
> 
> Most PCIE devices these days do not require IO support to function,
> so having an option to disable it in the driver is useful.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>  drivers/pci/host/pci-mvebu.c | 64 +++++++++++++++++++++++++++++---------------
>  1 file changed, 43 insertions(+), 21 deletions(-)

Applied to mvebu/drivers

thx,

Jason.
Jason Gunthorpe Oct. 17, 2013, 2:41 p.m. UTC | #2
On Thu, Oct 17, 2013 at 09:10:31AM -0400, Jason Cooper wrote:
> On Tue, Oct 15, 2013 at 02:16:22PM -0600, Jason Gunthorpe wrote:
> > Make pcie-io-aperture and the IO port MBUS ID in ranges optional.
> > If not provided the bridge reports to Linux that IO space mapping is
> > not supported and refuses to configure an IO mbus window.
> > 
> > This allows both complete disable (do not specify pcie-io-aperture) and
> > per-port disable (do not specify a IO target ranges entry for the port)
> > 
> > Most PCIE devices these days do not require IO support to function,
> > so having an option to disable it in the driver is useful.
> > 
> > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> >  drivers/pci/host/pci-mvebu.c | 64 +++++++++++++++++++++++++++++---------------
> >  1 file changed, 43 insertions(+), 21 deletions(-)
> 
> Applied to mvebu/drivers

If possible, Thomas should test this, IMHO. My hardware here
completely lacks IO port windows so I cannot meaningfully test that
aspect of the change.

Thanks,
Jason
Jason Cooper Oct. 17, 2013, 2:47 p.m. UTC | #3
On Thu, Oct 17, 2013 at 08:41:12AM -0600, Jason Gunthorpe wrote:
> On Thu, Oct 17, 2013 at 09:10:31AM -0400, Jason Cooper wrote:
> > On Tue, Oct 15, 2013 at 02:16:22PM -0600, Jason Gunthorpe wrote:
> > > Make pcie-io-aperture and the IO port MBUS ID in ranges optional.
> > > If not provided the bridge reports to Linux that IO space mapping is
> > > not supported and refuses to configure an IO mbus window.
> > > 
> > > This allows both complete disable (do not specify pcie-io-aperture) and
> > > per-port disable (do not specify a IO target ranges entry for the port)
> > > 
> > > Most PCIE devices these days do not require IO support to function,
> > > so having an option to disable it in the driver is useful.
> > > 
> > > Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> > >  drivers/pci/host/pci-mvebu.c | 64 +++++++++++++++++++++++++++++---------------
> > >  1 file changed, 43 insertions(+), 21 deletions(-)
> > 
> > Applied to mvebu/drivers
> 
> If possible, Thomas should test this, IMHO. My hardware here
> completely lacks IO port windows so I cannot meaningfully test that
> aspect of the change.

Agreed, Thomas and I were just discussing this on #mvlinux.  I took this
in to get it as much coverage in -next as possible.  Since we are getting
close to the end of the merge window and Stephen Rothwell is on
vacation, so -next testing isn't as smooth and predictable as usual.

Once Thomas (and others, don't be shy!) has tested it, I'll either add
the Tested-by or drop it for a new version.

thx,

Jason.
Thomas Petazzoni Oct. 31, 2013, 9:13 a.m. UTC | #4
Dear Jason Gunthorpe,

On Tue, 15 Oct 2013 14:16:22 -0600, Jason Gunthorpe wrote:
> Make pcie-io-aperture and the IO port MBUS ID in ranges optional.
> If not provided the bridge reports to Linux that IO space mapping is
> not supported and refuses to configure an IO mbus window.
> 
> This allows both complete disable (do not specify pcie-io-aperture) and
> per-port disable (do not specify a IO target ranges entry for the port)
> 
> Most PCIE devices these days do not require IO support to function,
> so having an option to disable it in the driver is useful.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>

Sorry for the delay in testing this, I was busy with kernel summit /
ELCE. Unfortunately, this patch still causes problems here: it breaks
usage of I/O region. I have a modified version of the e1000e driver
that makes it access an I/O region, that I have used for testing that
I/O handling is at least minimally working. Without your patch, it
works fine:

e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
e1000e: Copyright(c) 1999 - 2013 Intel Corporation.
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0146 -> 0147)
e1000e 0000:01:00.0: ==> e1000e: I/O start 0x10000, len 0x20
e1000e 0000:01:00.0: 1. 0x0 0x0 0xffffffff 0xffffffff
e1000e 0000:01:00.0: 2. 0x38 0x8100 0xffffffff 0xffffffff

(the last three messages are the one indicating that the I/O region is
working).

With your patch, it completely blows up when accessing the region:

e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
e1000e: Copyright(c) 1999 - 2013 Intel Corporation.
PCI: enabling device 0000:00:01.0 (0140 -> 0143)
PCI: enabling device 0000:01:00.0 (0146 -> 0147)
e1000e 0000:01:00.0: ==> e1000e: I/O start 0x10000, len 0x20
Unable to handle kernel paging request at virtual address fee10000
pgd = ee9a0000
[fee10000] *pgd=00000000
Internal error: Oops: 15 [#1] SMP ARM
Modules linked in:
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 3.12.0-rc7-00005-g1fa02be #166
task: ef049c40 ti: ef05c000 task.ti: ef05c000
PC is at e1000_probe+0x108/0xd54
LR is at e1000_probe+0xfc/0xd54
pc : [<c0239f54>]    lr : [<c0239f48>]    psr: 60000013
sp : ef05de30  ip : 00000000  fp : 00000000
r10: 00010000  r9 : ef05c000  r8 : fee10000
r7 : 00000001  r6 : c04037a8  r5 : ef2dac68  r4 : ef2dac00
r3 : 000003cd  r2 : c07f6570  r1 : 20000093  r0 : 0000003c
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 2e9a006a  DAC: 00000015
Process swapper/0 (pid: 1, stack limit = 0xef05c240)
Stack: (0xef05de30 to 0xef05e000)
de20:                                     ef049c40 00000000 eeb5ad08 ef2d7508
de40: ef050000 c00f8890 ef2dac68 c003cae8 00000000 ef2dac68 c0809e0c ef2dac00
de60: 00000000 c0809dd8 ef05c000 c0516458 00000000 c01821d4 c083c148 ef2dac68
de80: 00000000 c0809e0c c052bf10 c01b8fc0 00000000 ef2dac68 c0809e0c ef2dac9c
dea0: 00000000 c01b9168 00000000 c0809e0c c01b90dc c01b7620 ef09b11c ef2d59f4
dec0: c0809e0c eeb587c0 c07fe8c8 c01b86c8 c0809e54 ef05deec c0809e0c 00000006
dee0: c0538528 c0818700 c052bf10 c01b97a0 c0809dd8 c0540198 00000006 c0008878
df00: 00000099 c2058488 c04961f4 00000081 00000000 c04f2f14 00000063 ef05df30
df20: c0035e14 c0035e38 00000113 ffffffff c20584c8 c03e8428 00000099 c0036000
df40: c04f2704 00000006 c20584f1 00000006 c07f5aa0 c0540198 00000006 c0538528
df60: c0818700 00000099 c0538534 c0516458 00000000 c0516b90 00000006 00000006
df80: c0516458 000000a2 00000000 c03ccc4c 00000000 00000000 00000000 00000000
dfa0: 00000000 c03ccc54 00000000 c000e3f8 00000000 00000000 00000000 00000000
dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
dfe0: 00000000 00000000 00000000 00000000 00000013 00000000 08d8a360 34d44910
[<c0239f54>] (e1000_probe+0x108/0xd54) from [<c01821d4>] (pci_device_probe+0x64/0x88)
[<c01821d4>] (pci_device_probe+0x64/0x88) from [<c01b8fc0>] (driver_probe_device+0xf4/0x210)
[<c01b8fc0>] (driver_probe_device+0xf4/0x210) from [<c01b9168>] (__driver_attach+0x8c/0x90)
[<c01b9168>] (__driver_attach+0x8c/0x90) from [<c01b7620>] (bus_for_each_dev+0x54/0x88)
[<c01b7620>] (bus_for_each_dev+0x54/0x88) from [<c01b86c8>] (bus_add_driver+0xd4/0x258)
[<c01b86c8>] (bus_add_driver+0xd4/0x258) from [<c01b97a0>] (driver_register+0x78/0xf4)
[<c01b97a0>] (driver_register+0x78/0xf4) from [<c0008878>] (do_one_initcall+0xe4/0x140)
[<c0008878>] (do_one_initcall+0xe4/0x140) from [<c0516b90>] (kernel_init_freeable+0xfc/0x1c8)
[<c0516b90>] (kernel_init_freeable+0xfc/0x1c8) from [<c03ccc54>] (kernel_init+0x8/0xe4)
[<c03ccc54>] (kernel_init+0x8/0xe4) from [<c000e3f8>] (ret_from_fork+0x14/0x3c)
Code: ebfdef04 e594a198 e7f3805a e2488612 (e5982000) 
---[ end trace 0b6062d8d91eb05b ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

CPU2: stopping
CPU: 2 PID: 0 Comm: swapper/2 Tainted: G      D      3.12.0-rc7-00005-g1fa02be #166
[<c001580c>] (unwind_backtrace+0x0/0xf8) from [<c0011790>] (show_stack+0x10/0x14)
[<c0011790>] (show_stack+0x10/0x14) from [<c03d082c>] (dump_stack+0x70/0x8c)
[<c03d082c>] (dump_stack+0x70/0x8c) from [<c001418c>] (handle_IPI+0xfc/0x130)
[<c001418c>] (handle_IPI+0xfc/0x130) from [<c0008584>] (armada_370_xp_handle_irq+0x84/0xac)
[<c0008584>] (armada_370_xp_handle_irq+0x84/0xac) from [<c0012240>] (__irq_svc+0x40/0x50)
Exception stack(0xef079fa0 to 0xef079fe8)
9fa0: c206b7d8 00000000 000005ca 000005ca ef078000 c08185ec c07f0460 c08185ec
9fc0: ef078000 c03d7764 00000001 ef078000 00000510 ef079fe8 c000f570 c00510b0
9fe0: 60000113 ffffffff
[<c0012240>] (__irq_svc+0x40/0x50) from [<c00510b0>] (cpu_startup_entry+0x54/0x128)
[<c00510b0>] (cpu_startup_entry+0x54/0x128) from [<00008644>] (0x8644)
CPU1: stopping
CPU: 1 PID: 0 Comm: swapper/1 Tainted: G      D      3.12.0-rc7-00005-g1fa02be #166
[<c001580c>] (unwind_backtrace+0x0/0xf8) from [<c0011790>] (show_stack+0x10/0x14)
[<c0011790>] (show_stack+0x10/0x14) from [<c03d082c>] (dump_stack+0x70/0x8c)
[<c03d082c>] (dump_stack+0x70/0x8c) from [<c001418c>] (handle_IPI+0xfc/0x130)
[<c001418c>] (handle_IPI+0xfc/0x130) from [<c0008584>] (armada_370_xp_handle_irq+0x84/0xac)
[<c0008584>] (armada_370_xp_handle_irq+0x84/0xac) from [<c0012240>] (__irq_svc+0x40/0x50)
Exception stack(0xef077fa0 to 0xef077fe8)
7fa0: c20637d8 00000000 000012ac 000012ac ef076000 c08185ec c07f0460 c08185ec
7fc0: ef076000 c03d7764 00000001 ef076000 00000510 ef077fe8 c000f570 c00510b0
7fe0: 60000113 ffffffff
[<c0012240>] (__irq_svc+0x40/0x50) from [<c00510b0>] (cpu_startup_entry+0x54/0x128)
[<c00510b0>] (cpu_startup_entry+0x54/0x128) from [<00008644>] (0x8644)
CPU0: stopping
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G      D      3.12.0-rc7-00005-g1fa02be #166
[<c001580c>] (unwind_backtrace+0x0/0xf8) from [<c0011790>] (show_stack+0x10/0x14)
[<c0011790>] (show_stack+0x10/0x14) from [<c03d082c>] (dump_stack+0x70/0x8c)
[<c03d082c>] (dump_stack+0x70/0x8c) from [<c001418c>] (handle_IPI+0xfc/0x130)
[<c001418c>] (handle_IPI+0xfc/0x130) from [<c0008584>] (armada_370_xp_handle_irq+0x84/0xac)
[<c0008584>] (armada_370_xp_handle_irq+0x84/0xac) from [<c0012240>] (__irq_svc+0x40/0x50)
Exception stack(0xc07e9f70 to 0xc07e9fb8)
9f60:                                     c205b7d8 00000000 00000d2c 00000d2c
9f80: c07e8000 c08185ec c07f0460 c08185ec c07e8000 c03d7764 00000001 c07e8000
9fa0: 00000510 c07e9fb8 c000f570 c00510b0 60000113 ffffffff
[<c0012240>] (__irq_svc+0x40/0x50) from [<c00510b0>] (cpu_startup_entry+0x54/0x128)
[<c00510b0>] (cpu_startup_entry+0x54/0x128) from [<c0516a40>] (start_kernel+0x2a0/0x2f4)
[<c0516a40>] (start_kernel+0x2a0/0x2f4) from [<00008074>] (0x8074)

After debugging, it turns out that the following change is the problem:

-	if (port->bridge.iolimit < port->bridge.iobase ||
+	if (port->bridge.iolimit <= port->bridge.iobase ||


with this change, the mvebu_pcie_handle_iobase_change() always
considers the I/O region settings as "invalid", and therefore the I/O
window is never created and the pci_ioremap_io() function is never
called. Reverting this change makes your patch works fine for me.

Best regards,

Thomas
Jason Gunthorpe Oct. 31, 2013, 4:48 p.m. UTC | #5
On Thu, Oct 31, 2013 at 10:13:57AM +0100, Thomas Petazzoni wrote:

> Sorry for the delay in testing this, I was busy with kernel summit /
> ELCE. Unfortunately, this patch still causes problems here: it breaks
> usage of I/O region. I have a modified version of the e1000e driver
> that makes it access an I/O region, that I have used for testing that
> I/O handling is at least minimally working. Without your patch, it
> works fine:

No problem, thank you for testing it, I don't have any hardware like
this..

> After debugging, it turns out that the following change is the problem:
> 
> -	if (port->bridge.iolimit < port->bridge.iobase ||
> +	if (port->bridge.iolimit <= port->bridge.iobase ||

Indeed.. I was having problems here because linux was writing 0,0
during discovery to the base,limit registers to 'disable' the IO
window, which triggered a window allocation. So I re-read the PCI
bridge spec (apparently too quickly) and decided 0,0 was OK, and it
should be <=.

However, looking again at the spec - it is very clear, < is the
correct test, and when the values is equal a 4k window should be
created.

So, I wonder if there is a little bug in the Linux discovery code
path, should it write FFFF,0 instead?

In any event, I can respin this patch so it works, I have to drop the
WARN_ON in that routine to accommodate the 0,0 write however.

> window is never created and the pci_ioremap_io() function is never
> called. Reverting this change makes your patch works fine for me.

Great, the next version should be good to go then.

Thanks,
Jason
Bjorn Helgaas Oct. 31, 2013, 5:10 p.m. UTC | #6
On Thu, Oct 31, 2013 at 10:48 AM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:

> Indeed.. I was having problems here because linux was writing 0,0
> during discovery to the base,limit registers to 'disable' the IO
> window, which triggered a window allocation. So I re-read the PCI
> bridge spec (apparently too quickly) and decided 0,0 was OK, and it
> should be <=.
>
> However, looking again at the spec - it is very clear, < is the
> correct test, and when the values is equal a 4k window should be
> created.
>
> So, I wonder if there is a little bug in the Linux discovery code
> path, should it write FFFF,0 instead?

Sounds like a bug to me.  Do you want to fix it, or point me at it the
place where we write 0,0 so I can fix it?

Bjorn
Jason Gunthorpe Oct. 31, 2013, 11:32 p.m. UTC | #7
On Thu, Oct 31, 2013 at 11:10:42AM -0600, Bjorn Helgaas wrote:
> On Thu, Oct 31, 2013 at 10:48 AM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
> 
> > Indeed.. I was having problems here because linux was writing 0,0
> > during discovery to the base,limit registers to 'disable' the IO
> > window, which triggered a window allocation. So I re-read the PCI
> > bridge spec (apparently too quickly) and decided 0,0 was OK, and it
> > should be <=.
> >
> > However, looking again at the spec - it is very clear, < is the
> > correct test, and when the values is equal a 4k window should be
> > created.
> >
> > So, I wonder if there is a little bug in the Linux discovery code
> > path, should it write FFFF,0 instead?
> 
> Sounds like a bug to me.  Do you want to fix it, or point me at it the
> place where we write 0,0 so I can fix it?

Okay, I remember this now, it was annoying to debug because the PCI
driver inits before the serial port starts working, fortunately
Sebastian fixed that up, now it is much easier to see what is going
on..

The 0,0 write comes from this trace:

 [<c0111b58>] (mvebu_pcie_handle_iobase_change+0x0/0x140) from [<c0111e70>] (mvebu_pcie_wr_conf+0x1d8/0x318) r5:00000000 r4:c78d0850
 [<c0111c98>] (mvebu_pcie_wr_conf+0x0/0x318) from [<c0100e90>] (pci_bus_write_config_word+0x40/0x4c)
 [<c0100e50>] (pci_bus_write_config_word+0x0/0x4c) from [<c02273d0>] (__pci_bus_size_bridges+0x2e4/0x744) r4:00000000
 [<c02270ec>] (__pci_bus_size_bridges+0x0/0x744) from [<c0227344>] (__pci_bus_size_bridges+0x258/0x744)
 [<c02270ec>] (__pci_bus_size_bridges+0x0/0x744) from [<c0227844>] (pci_bus_size_bridges+0x14/0x18)
 [<c0227830>] (pci_bus_size_bridges+0x0/0x18) from [<c000cdc0>] (pci_common_init_dev+0x26c/0x2c0)
 [<c000cb54>] (pci_common_init_dev+0x0/0x2c0) from [<c0111608>] (mvebu_pcie_probe+0x41c/0x480)
 [<c01111ec>] (mvebu_pcie_probe+0x0/0x480) from [<c01335f0>] (platform_drv_probe+0x1c/0x20)

And specifically this code sequence in pci_bridge_check_ranges:

	pci_read_config_word(bridge, PCI_IO_BASE, &io);
	if (!io) {
		pci_write_config_word(bridge, PCI_IO_BASE, 0xf0f0);
		pci_read_config_word(bridge, PCI_IO_BASE, &io);
 		pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
 	}
 	if (io)
		b_res[0].flags |= IORESOURCE_IO;

If I read this properly, the first sets the IO window to a 4k region
at 0xF000, and the second sets the IO window to a 4k region at 0x0.

How about this instead:

	if (!io) {
	        /* Disable the IO window by setting limit < base */
		pci_write_config_word(bridge, PCI_IO_BASE, 0x00f0);
		pci_read_config_word(bridge, PCI_IO_BASE, &io);
	}
        /* Bridges without IO support must wire the IO register to 0 */
 	if (io)
		b_res[0].flags |= IORESOURCE_IO;

3.2.5.6 is clear that limit < base disables the decoder (however this
alone is not reliable if the upper 16 bits are programmed)

However, I'm guessing earlier in the sequence the PCI core clears
PCI_COMMAND_IO in the bridge command register? If so this isn't a
functional problem, it is just weird looking.

Which is the reason why my assert triggers, the driver ignores
PCI_COMMAND_IO/MEMORY.. Fixing that solves the issue completely.

Jason
Bjorn Helgaas Nov. 1, 2013, 2:44 a.m. UTC | #8
On Thu, Oct 31, 2013 at 5:32 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Thu, Oct 31, 2013 at 11:10:42AM -0600, Bjorn Helgaas wrote:
>> On Thu, Oct 31, 2013 at 10:48 AM, Jason Gunthorpe
>> <jgunthorpe@obsidianresearch.com> wrote:
>>
>> > Indeed.. I was having problems here because linux was writing 0,0
>> > during discovery to the base,limit registers to 'disable' the IO
>> > window, which triggered a window allocation. So I re-read the PCI
>> > bridge spec (apparently too quickly) and decided 0,0 was OK, and it
>> > should be <=.
>> >
>> > However, looking again at the spec - it is very clear, < is the
>> > correct test, and when the values is equal a 4k window should be
>> > created.
>> >
>> > So, I wonder if there is a little bug in the Linux discovery code
>> > path, should it write FFFF,0 instead?
>>
>> Sounds like a bug to me.  Do you want to fix it, or point me at it the
>> place where we write 0,0 so I can fix it?
>
> Okay, I remember this now, it was annoying to debug because the PCI
> driver inits before the serial port starts working, fortunately
> Sebastian fixed that up, now it is much easier to see what is going
> on..
>
> The 0,0 write comes from this trace:
>
>  [<c0111b58>] (mvebu_pcie_handle_iobase_change+0x0/0x140) from [<c0111e70>] (mvebu_pcie_wr_conf+0x1d8/0x318) r5:00000000 r4:c78d0850
>  [<c0111c98>] (mvebu_pcie_wr_conf+0x0/0x318) from [<c0100e90>] (pci_bus_write_config_word+0x40/0x4c)
>  [<c0100e50>] (pci_bus_write_config_word+0x0/0x4c) from [<c02273d0>] (__pci_bus_size_bridges+0x2e4/0x744) r4:00000000
>  [<c02270ec>] (__pci_bus_size_bridges+0x0/0x744) from [<c0227344>] (__pci_bus_size_bridges+0x258/0x744)
>  [<c02270ec>] (__pci_bus_size_bridges+0x0/0x744) from [<c0227844>] (pci_bus_size_bridges+0x14/0x18)
>  [<c0227830>] (pci_bus_size_bridges+0x0/0x18) from [<c000cdc0>] (pci_common_init_dev+0x26c/0x2c0)
>  [<c000cb54>] (pci_common_init_dev+0x0/0x2c0) from [<c0111608>] (mvebu_pcie_probe+0x41c/0x480)
>  [<c01111ec>] (mvebu_pcie_probe+0x0/0x480) from [<c01335f0>] (platform_drv_probe+0x1c/0x20)
>
> And specifically this code sequence in pci_bridge_check_ranges:
>
>         pci_read_config_word(bridge, PCI_IO_BASE, &io);
>         if (!io) {
>                 pci_write_config_word(bridge, PCI_IO_BASE, 0xf0f0);
>                 pci_read_config_word(bridge, PCI_IO_BASE, &io);
>                 pci_write_config_word(bridge, PCI_IO_BASE, 0x0);
>         }
>         if (io)
>                 b_res[0].flags |= IORESOURCE_IO;
>
> If I read this properly, the first sets the IO window to a 4k region
> at 0xF000, and the second sets the IO window to a 4k region at 0x0.

The purpose of this code is not to disable the I/O window; it is
merely to determine whether the bridge implements an I/O window.  The
configuration of the window (if implemented) is changed only
temporarily before being restored to the original config.

If the bridge does not implement an I/O window, the I/O Base and Limit
registers must be read-only and zero.  So if we read zero the first
time, either the bridge doesn't implement an I/O window, or it does
implement it and it is configured to the 4K region at 0x0.  The body
of the "if" does a write to see whether the registers are writable.

You obviously understand all this, so sorry for the repetition; I
guess I'm just trying to get it clear in my own mind :)

> How about this instead:
>
>         if (!io) {
>                 /* Disable the IO window by setting limit < base */
>                 pci_write_config_word(bridge, PCI_IO_BASE, 0x00f0);
>                 pci_read_config_word(bridge, PCI_IO_BASE, &io);
>         }
>         /* Bridges without IO support must wire the IO register to 0 */
>         if (io)
>                 b_res[0].flags |= IORESOURCE_IO;

If an I/O window happened to be configured to the 4K region at 0x0,
this disables it.  An I/O window configured anywhere else is left
enabled.  Previously the only effect of the function was to set bits
in b_res[].flags; with this change it would also sometimes disable an
I/O window.  That seems worse to me.  Am I just misunderstanding the
problem you're solving?

Bjorn
Jason Gunthorpe Nov. 1, 2013, 5:28 a.m. UTC | #9
On Thu, Oct 31, 2013 at 08:44:36PM -0600, Bjorn Helgaas wrote:
 
> You obviously understand all this, so sorry for the repetition; I
> guess I'm just trying to get it clear in my own mind :)

Ditto, it is tricky. When I first looked at that code I thought it was
trying to do something else..

> > How about this instead:
> >
> >         if (!io) {
> >                 /* Disable the IO window by setting limit < base */
> >                 pci_write_config_word(bridge, PCI_IO_BASE, 0x00f0);
> >                 pci_read_config_word(bridge, PCI_IO_BASE, &io);
> >         }
> >         /* Bridges without IO support must wire the IO register to 0 */
> >         if (io)
> >                 b_res[0].flags |= IORESOURCE_IO;
> 
> If an I/O window happened to be configured to the 4K region at 0x0,
> this disables it.  An I/O window configured anywhere else is left
> enabled.  Previously the only effect of the function was to set bits
> in b_res[].flags; with this change it would also sometimes disable an
> I/O window.  That seems worse to me.  Am I just misunderstanding the
> problem you're solving?

No, you've got it right. The problem I had is completely solved by
PATCH 1/2 PCI: mvebu - The bridge should obey the MEM and IO command
bits

But the code looks odd to me. 

It is certainly correct as written if the check is only ever called
with PCI_COMMAND_IO cleared. 

However, I am having trouble convincing myself of this. Notably
pci_setup_bridge does not seem to make that assumption.

If PCI_COMMAND_IO is set, it can just assume IORESOURCE_IO and skip
the testing of the PCI_IO_BASE register.

Why 0xf0f0 anyhow? :)

Jason
Bjorn Helgaas Nov. 1, 2013, 1:50 p.m. UTC | #10
On Thu, Oct 31, 2013 at 11:28 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Thu, Oct 31, 2013 at 08:44:36PM -0600, Bjorn Helgaas wrote:
>
>> You obviously understand all this, so sorry for the repetition; I
>> guess I'm just trying to get it clear in my own mind :)
>
> Ditto, it is tricky. When I first looked at that code I thought it was
> trying to do something else..
>
>> > How about this instead:
>> >
>> >         if (!io) {
>> >                 /* Disable the IO window by setting limit < base */
>> >                 pci_write_config_word(bridge, PCI_IO_BASE, 0x00f0);
>> >                 pci_read_config_word(bridge, PCI_IO_BASE, &io);
>> >         }
>> >         /* Bridges without IO support must wire the IO register to 0 */
>> >         if (io)
>> >                 b_res[0].flags |= IORESOURCE_IO;
>>
>> If an I/O window happened to be configured to the 4K region at 0x0,
>> this disables it.  An I/O window configured anywhere else is left
>> enabled.  Previously the only effect of the function was to set bits
>> in b_res[].flags; with this change it would also sometimes disable an
>> I/O window.  That seems worse to me.  Am I just misunderstanding the
>> problem you're solving?
>
> No, you've got it right. The problem I had is completely solved by
> PATCH 1/2 PCI: mvebu - The bridge should obey the MEM and IO command
> bits
>
> But the code looks odd to me.
>
> It is certainly correct as written if the check is only ever called
> with PCI_COMMAND_IO cleared.
>
> However, I am having trouble convincing myself of this. Notably
> pci_setup_bridge does not seem to make that assumption.

Yeah, I didn't see any place that clears or even checks PCI_COMMAND_IO
before doing this.   Originally this would only have been run at
boot-time, and there was probably no worry about concurrent accesses
that might get claimed while the window is temporarily configured at
0xf000.  But the boot-time only assumption no longer holds, so this
makes me uneasy.

> If PCI_COMMAND_IO is set, it can just assume IORESOURCE_IO and skip
> the testing of the PCI_IO_BASE register.
>
> Why 0xf0f0 anyhow? :)

I don't know why 0xf0f0.  Anything that's non-zero in the upper four
bits of each byte should work to determine writability.  It seems like
it would be safer to write 0x00f0 so the window is disabled, because
even if we're enumerating the bridge at hot-plug time, nobody else
should be trying to access a device behind the bridge, and we wouldn't
inadvertently claim something destined for [io 0xf000-0xffff].

Bjorn
diff mbox

Patch

diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
index f2d61f5..09fc586 100644
--- a/drivers/pci/host/pci-mvebu.c
+++ b/drivers/pci/host/pci-mvebu.c
@@ -132,6 +132,11 @@  struct mvebu_pcie_port {
 	size_t iowin_size;
 };
 
+static inline bool mvebu_has_ioport(struct mvebu_pcie_port *port)
+{
+	return port->io_target != -1 && port->io_attr != -1;
+}
+
 static bool mvebu_pcie_link_up(struct mvebu_pcie_port *port)
 {
 	return !(readl(port->base + PCIE_STAT_OFF) & PCIE_STAT_LINK_DOWN);
@@ -278,7 +283,7 @@  static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 	phys_addr_t iobase;
 
 	/* Are the new iobase/iolimit values invalid? */
-	if (port->bridge.iolimit < port->bridge.iobase ||
+	if (port->bridge.iolimit <= port->bridge.iobase ||
 	    port->bridge.iolimitupper < port->bridge.iobaseupper) {
 
 		/* If a window was configured, remove it */
@@ -292,6 +297,12 @@  static void mvebu_pcie_handle_iobase_change(struct mvebu_pcie_port *port)
 		return;
 	}
 
+	if (!mvebu_has_ioport(port)) {
+		dev_WARN(&port->pcie->pdev->dev,
+			 "Attempt to set IO when IO is disabled\n");
+		return;
+	}
+
 	/*
 	 * We read the PCI-to-PCI bridge emulated registers, and
 	 * calculate the base address and size of the address decoding
@@ -405,9 +416,12 @@  static int mvebu_sw_pci_bridge_read(struct mvebu_pcie_port *port,
 		break;
 
 	case PCI_IO_BASE:
-		*value = (bridge->secondary_status << 16 |
-			  bridge->iolimit          <<  8 |
-			  bridge->iobase);
+		if (!mvebu_has_ioport(port))
+			*value = bridge->secondary_status << 16;
+		else
+			*value = (bridge->secondary_status << 16 |
+				  bridge->iolimit          <<  8 |
+				  bridge->iobase);
 		break;
 
 	case PCI_MEMORY_BASE:
@@ -465,6 +479,8 @@  static int mvebu_sw_pci_bridge_write(struct mvebu_pcie_port *port,
 	switch (where & ~3) {
 	case PCI_COMMAND:
 		bridge->command = value & 0xffff;
+		if (!mvebu_has_ioport(port))
+			bridge->command &= ~PCI_COMMAND_IO;
 		break;
 
 	case PCI_BASE_ADDRESS_0 ... PCI_BASE_ADDRESS_1:
@@ -630,7 +646,9 @@  static int __init mvebu_pcie_setup(int nr, struct pci_sys_data *sys)
 	struct mvebu_pcie *pcie = sys_to_pcie(sys);
 	int i;
 
-	pci_add_resource_offset(&sys->resources, &pcie->realio, sys->io_offset);
+	if (resource_size(&pcie->realio) != 0)
+		pci_add_resource_offset(&sys->resources, &pcie->realio,
+					sys->io_offset);
 	pci_add_resource_offset(&sys->resources, &pcie->mem, sys->mem_offset);
 	pci_add_resource(&sys->resources, &pcie->busn);
 
@@ -739,12 +757,17 @@  mvebu_pcie_map_registers(struct platform_device *pdev,
 #define DT_CPUADDR_TO_ATTR(cpuaddr)   (((cpuaddr) >> 48) & 0xFF)
 
 static int mvebu_get_tgt_attr(struct device_node *np, int devfn,
-			      unsigned long type, int *tgt, int *attr)
+			      unsigned long type,
+			      unsigned int *tgt,
+			      unsigned int *attr)
 {
 	const int na = 3, ns = 2;
 	const __be32 *range;
 	int rlen, nranges, rangesz, pna, i;
 
+	*tgt = -1;
+	*attr = -1;
+
 	range = of_get_property(np, "ranges", &rlen);
 	if (!range)
 		return -EINVAL;
@@ -798,16 +821,15 @@  static int __init mvebu_pcie_probe(struct platform_device *pdev)
 	}
 
 	mvebu_mbus_get_pcie_io_aperture(&pcie->io);
-	if (resource_size(&pcie->io) == 0) {
-		dev_err(&pdev->dev, "invalid I/O aperture size\n");
-		return -EINVAL;
-	}
 
-	pcie->realio.flags = pcie->io.flags;
-	pcie->realio.start = PCIBIOS_MIN_IO;
-	pcie->realio.end = min_t(resource_size_t,
-				  IO_SPACE_LIMIT,
-				  resource_size(&pcie->io));
+	if (resource_size(&pcie->io) != 0) {
+		pcie->realio.flags = pcie->io.flags;
+		pcie->realio.start = PCIBIOS_MIN_IO;
+		pcie->realio.end = min_t(resource_size_t,
+					 IO_SPACE_LIMIT,
+					 resource_size(&pcie->io));
+	} else
+		pcie->realio = pcie->io;
 
 	/* Get the bus range */
 	ret = of_pci_parse_bus_range(np, &pcie->busn);
@@ -864,12 +886,12 @@  static int __init mvebu_pcie_probe(struct platform_device *pdev)
 			continue;
 		}
 
-		ret = mvebu_get_tgt_attr(np, port->devfn, IORESOURCE_IO,
-					 &port->io_target, &port->io_attr);
-		if (ret < 0) {
-			dev_err(&pdev->dev, "PCIe%d.%d: cannot get tgt/attr for io window\n",
-				port->port, port->lane);
-			continue;
+		if (resource_size(&pcie->io) != 0)
+			mvebu_get_tgt_attr(np, port->devfn, IORESOURCE_IO,
+					   &port->io_target, &port->io_attr);
+		else {
+			port->io_target = -1;
+			port->io_attr = -1;
 		}
 
 		port->base = mvebu_pcie_map_registers(pdev, child, port);