Fixing PCIe issues on Armada XP

Message ID	alpine.DEB.2.10.1404102206090.28003@vroombuntu (mailing list archive)
State	New, archived
Delegated to:	Bjorn Helgaas
Headers	show Return-Path: <linux-pci-owner@kernel.org> Date: Thu, 10 Apr 2014 22:56:00 +0100 (BST) From: Neil Greatorex <neil@fatboyfat.co.uk> To: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>, Willy Tarreau <w@1wt.eu>, Matthew Minter <matthew_minter@xyratex.com>, Gerlando Falauto <gerlando.falauto@keymile.com>, linux-arm-kernel@lists.infradead.org, Jason Cooper <jason@lakedaemon.net>, =?ISO-8859-15?Q?Gregory_Cl=E9ment?= <gregory.clement@free-electrons.com>, Ezequiel Garcia <ezequiel.garcia@free-electrons.com>, Andrew Lunn <andrew@lunn.ch>, linux-pci@vger.kernel.org, Tawfik Bayouk <tawfik@marvell.com>, Lior Amsalem <alior@marvell.com> Subject: Re: Fixing PCIe issues on Armada XP In-Reply-To: <20140410201201.GA12661@obsidianresearch.com> Message-ID: <alpine.DEB.2.10.1404102206090.28003@vroombuntu> References: <20140410181953.50ccfcc3@skate> <20140410165733.GB23104@obsidianresearch.com> <20140410200153.46669e0c@skate> <20140410201201.GA12661@obsidianresearch.com> User-Agent: Alpine 2.10 (DEB 1266 2009-07-14) MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="8323329-2088357076-1397166964=:28003" Sender: linux-pci-owner@vger.kernel.org Precedence: bulk

Neil Greatorex April 10, 2014, 9:56 p.m. UTC

Jason,

On Thu, 10 Apr 2014, Jason Gunthorpe wrote:

> Gating the clock without disabling the Phy first does sound like a
> bad idea..
>
> Neil, does this do anything for you?
>
> diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
> index f3b325f..e0a032f 100644
> --- a/arch/arm/mach-mvebu/mvebu-soc-id.c
> +++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
> @@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
>        iounmap(pci_base);
>
> res_ioremap:
> -       clk_disable_unprepare(clk);
> +//     clk_disable_unprepare(clk);
>
> clk_err:
>        of_node_put(child);
>

That patch has fixed it for me. The PCIe card seems to be now be always 
properly detected.

> In any event, turning on the clock should almost certainly be
> accompanied by a phy reset sequence to get both link ends on the same
> page.
>
> Attached is a rough, untested patch along those lines.
>

I took your attached patch and extended it a bit to print out how long it 
took. The delays also need to be much longer for me. I also fixed a small 
typo you made where the bit wasn't being set again to bring the link back 
up. I've attached the diff to your patch, as well as the combined patch 
(hope that makes sense).

With the attached patch I get the following output:

mirabox ~ # dmesg | grep PCIe0.0
[    0.135947] mvebu-pcie pcie-controller.3: PCIe0.0: performing link 
reset
[    0.161989] mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 
26 tries
[    0.173984] mvebu-pcie pcie-controller.3: PCIe0.0: link came back up 
after 12 tries
mirabox ~ # lspci
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01)
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710 (rev 01)
01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network 
Connection (rev 01)
03:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 
02)

So that seems to also work. I will leave it to you and Thomas to decide 
which approach is better!

Cheers,
Neil

Jason Gunthorpe April 10, 2014, 10:06 p.m. UTC | #1

On Thu, Apr 10, 2014 at 10:56:00PM +0100, Neil Greatorex wrote:

> >diff --git a/arch/arm/mach-mvebu/mvebu-soc-id.c b/arch/arm/mach-mvebu/mvebu-soc-id.c
> >index f3b325f..e0a032f 100644
> >+++ b/arch/arm/mach-mvebu/mvebu-soc-id.c
> >@@ -107,7 +107,7 @@ static int __init mvebu_soc_id_init(void)
> >       iounmap(pci_base);
> >
> >res_ioremap:
> >-       clk_disable_unprepare(clk);
> >+//     clk_disable_unprepare(clk);
> >
> >clk_err:
> >       of_node_put(child);
> >
> 
> That patch has fixed it for me. The PCIe card seems to be now be
> always properly detected.

Okay, that makes lots of sense to me at least.

Gregory: I have this vauge recollection it was discussed when you
wrote the mvebu_soc_id_init patch to use a __clk_is_enabled but got
shot down? Clearly this needs to be fixed.

> >In any event, turning on the clock should almost certainly be
> >accompanied by a phy reset sequence to get both link ends on the same
> >page.
> >
> >Attached is a rough, untested patch along those lines.
> >
> 
> I took your attached patch and extended it a bit to print out how
> long it took. The delays also need to be much longer for me. I also
> fixed a small typo you made where the bit wasn't being set again to
> bring the link back up. I've attached the diff to your patch, as
> well as the combined patch (hope that makes sense).

Just to be clear you tried this alone without the above?

Thanks for fixing the patch, I think it also confirms the theory.

IMHO, both approaches are required.

The first prevents messing up the PEX as was left by the bootloader

The second allows the driver to startup a PEX that wasn't enabled by
the bootloader, and recover from clock gating in the kernel (eg the
modular case)

Both seem valuable..

Ideally I'd like to see the clk driver turn off the PEX PHY when it
gates the clock, but I have no great idea how to accomplish that sort
of cross register space adventure...

Thomas, if we can figure out how to properly do __clk_is_enabled I
can probably send a proper patch?

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Neil Greatorex April 10, 2014, 10:15 p.m. UTC | #2

Jason,

On Thu, 10 Apr 2014, Jason Gunthorpe wrote:

>> I took your attached patch and extended it a bit to print out how
>> long it took. The delays also need to be much longer for me. I also
>> fixed a small typo you made where the bit wasn't being set again to
>> bring the link back up. I've attached the diff to your patch, as
>> well as the combined patch (hope that makes sense).
>
> Just to be clear you tried this alone without the above?

Yes I did.

>
> Thanks for fixing the patch, I think it also confirms the theory.
>

No problem!

> IMHO, both approaches are required.
>
> The first prevents messing up the PEX as was left by the bootloader
>
> The second allows the driver to startup a PEX that wasn't enabled by
> the bootloader, and recover from clock gating in the kernel (eg the
> modular case)
>
> Both seem valuable..
>
> Ideally I'd like to see the clk driver turn off the PEX PHY when it
> gates the clock, but I have no great idea how to accomplish that sort
> of cross register space adventure...
>
> Thomas, if we can figure out how to properly do __clk_is_enabled I
> can probably send a proper patch?

Let me know when you do and I will test it and add my Tested-by.

>
> Thanks,
> Jason
>

Cheers,
Neil

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Petazzoni April 11, 2014, 10:23 a.m. UTC | #3

Dear Neil Greatorex,

On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
> > In any event, turning on the clock should almost certainly be
> > accompanied by a phy reset sequence to get both link ends on the same
> > page.
> >
> > Attached is a rough, untested patch along those lines.
> >
> 
> I took your attached patch and extended it a bit to print out how long it 
> took. The delays also need to be much longer for me. I also fixed a small 
> typo you made where the bit wasn't being set again to bring the link back 
> up. I've attached the diff to your patch, as well as the combined patch 
> (hope that makes sense).

I managed to reproduce the problem of the PCIe device not being
detected on Mirabox when earlyprintk is disabled.

However, the proposed patch doesn't seem to work:

mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries

It waits hundred times for the link to go down, but it never goes down,
and then it doesn't wait at all to go up... because it never went down.

Moreover, I am not entirely convinced that a PHY reset is needed here.
In my tests, doing just a wait after setting the local dev number was
sufficient. The fact is works with earlyprintk also indicates that we
don't need to do any specific action with the PHY, just to wait a bit
of time. I am worried that resetting the PHY might actually take more
time than what is needed, and may have other consequences that we don't
necessarily understand at this point.

Thomas

Jason Gunthorpe April 11, 2014, 4:31 p.m. UTC | #4

On Fri, Apr 11, 2014 at 12:23:14PM +0200, Thomas Petazzoni wrote:

> On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
> > > In any event, turning on the clock should almost certainly be
> > > accompanied by a phy reset sequence to get both link ends on the same
> > > page.
> > >
> > > Attached is a rough, untested patch along those lines.
> > >
> > 
> > I took your attached patch and extended it a bit to print out how long it 
> > took. The delays also need to be much longer for me. I also fixed a small 
> > typo you made where the bit wasn't being set again to bring the link back 
> > up. I've attached the diff to your patch, as well as the combined patch 
> > (hope that makes sense).
> 
> I managed to reproduce the problem of the PCIe device not being
> detected on Mirabox when earlyprintk is disabled.
> 
> However, the proposed patch doesn't seem to work:
> 
> mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
> mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
> mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries
> 
> It waits hundred times for the link to go down, but it never goes down,
> and then it doesn't wait at all to go up... because it never went down.

I wonder if bit 30 only disables link training, but doesn't force the
link down. There may be another bit that is required here.

Alternatively, are you seeing a different problem? If you apply the
hack to the socid does your symptom go away as well?

> Moreover, I am not entirely convinced that a PHY reset is needed here.
> In my tests, doing just a wait after setting the local dev number was
> sufficient. The fact is works with earlyprintk also indicates that we
> don't need to do any specific action with the PHY, just to wait a bit
> of time. I am worried that resetting the PHY might actually take more
> time than what is needed, and may have other consequences that we don't
> necessarily understand at this point.

I really disagree - clearly the fundamental problem is suspending one
side of the PEX link while the other remains operating. That isn't
specified to work and really shouldn't work.

If you suspend one side of the PEX you *MUST* reset the
link. Absolutely. No Doubt In My Mind.

Remember, PCI-E is a serial shared state protocol. The two sides must
remain in sync or a link reset is required to recover the shared
state. Halting one side obviously destroys this invariant.

I think in many cases the reset happens autonomously. The remote side
will force it to happen. This is what the debugging from Neil shows -
the link just resets at some inconvenient point. Now we know why.

The timing sensitivity also makes sense, if you suspend for a very
short time window the other side might not notice. If you suspend for
longer the other side will reset the link autonomously and your local
side will quickly notice the reset once it comes back.

If you suspend for a little bit the link might not retrain immediately
but the shared state can become corrupted (eg sequence number
mismatch) this will eventually trigger a reset, after the local PEX
has been operating again.

The LinkUp bit after resume is also clearly a lie - most likely the
PEX takes some time to detect the change in remote state to trigger a
link down. After all, it was suspended while the remote was busy
trying to recover.

The fundamental problem here is the clock driver. It should not be
gating PEX clocks so naively. A PEX suspend needs to be sequenced to
ensure the link is cleanly brought down before the PEX is put to
sleep. That way it can cleanly and unambiguously be started up when it
resumes. No risk of glitching/corrupting the far side with some kind
of crap on the serial bus.

In any event, the most important required patch here is one that fixes
socid. It must not turn off the PEX clock. Then we can talk about how
to fix pci-mvebu to work as a module...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matthew Minter April 11, 2014, 5:21 p.m. UTC | #5

Hi Guys,

I just wanted to confirm that both V1 and V2 of the patch set fixes my
problems, I still have an odd bug where a certain peripheral causes
the link to drop down from x4 to x1 but I think that is far more
likely a hardware issue with the device as it does not happen with any
other peripheral. So to confirm you can put my "tested by" on the
patches if you like.

I would also like to note, my board uses an external clock for the PCI
ports, thus it is unlikely to be effected by any issues relating to
clock drivers.

Please keep me up to date if there are any new revisions of the set
and I will try and test them.

Many thanks,
Matt

On 11 April 2014 17:31, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote:
> On Fri, Apr 11, 2014 at 12:23:14PM +0200, Thomas Petazzoni wrote:
>
>> On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:
>> > > In any event, turning on the clock should almost certainly be
>> > > accompanied by a phy reset sequence to get both link ends on the same
>> > > page.
>> > >
>> > > Attached is a rough, untested patch along those lines.
>> > >
>> >
>> > I took your attached patch and extended it a bit to print out how long it
>> > took. The delays also need to be much longer for me. I also fixed a small
>> > typo you made where the bit wasn't being set again to bring the link back
>> > up. I've attached the diff to your patch, as well as the combined patch
>> > (hope that makes sense).
>>
>> I managed to reproduce the problem of the PCIe device not being
>> detected on Mirabox when earlyprintk is disabled.
>>
>> However, the proposed patch doesn't seem to work:
>>
>> mvebu-pcie pcie-controller.3: PCIe0.0: performing link reset
>> mvebu-pcie pcie-controller.3: PCIe0.0: link went down after 100 tries
>> mvebu-pcie pcie-controller.3: PCIe0.0: link came back up after 0 tries
>>
>> It waits hundred times for the link to go down, but it never goes down,
>> and then it doesn't wait at all to go up... because it never went down.
>
> I wonder if bit 30 only disables link training, but doesn't force the
> link down. There may be another bit that is required here.
>
> Alternatively, are you seeing a different problem? If you apply the
> hack to the socid does your symptom go away as well?
>
>> Moreover, I am not entirely convinced that a PHY reset is needed here.
>> In my tests, doing just a wait after setting the local dev number was
>> sufficient. The fact is works with earlyprintk also indicates that we
>> don't need to do any specific action with the PHY, just to wait a bit
>> of time. I am worried that resetting the PHY might actually take more
>> time than what is needed, and may have other consequences that we don't
>> necessarily understand at this point.
>
> I really disagree - clearly the fundamental problem is suspending one
> side of the PEX link while the other remains operating. That isn't
> specified to work and really shouldn't work.
>
> If you suspend one side of the PEX you *MUST* reset the
> link. Absolutely. No Doubt In My Mind.
>
> Remember, PCI-E is a serial shared state protocol. The two sides must
> remain in sync or a link reset is required to recover the shared
> state. Halting one side obviously destroys this invariant.
>
> I think in many cases the reset happens autonomously. The remote side
> will force it to happen. This is what the debugging from Neil shows -
> the link just resets at some inconvenient point. Now we know why.
>
> The timing sensitivity also makes sense, if you suspend for a very
> short time window the other side might not notice. If you suspend for
> longer the other side will reset the link autonomously and your local
> side will quickly notice the reset once it comes back.
>
> If you suspend for a little bit the link might not retrain immediately
> but the shared state can become corrupted (eg sequence number
> mismatch) this will eventually trigger a reset, after the local PEX
> has been operating again.
>
> The LinkUp bit after resume is also clearly a lie - most likely the
> PEX takes some time to detect the change in remote state to trigger a
> link down. After all, it was suspended while the remote was busy
> trying to recover.
>
> The fundamental problem here is the clock driver. It should not be
> gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> ensure the link is cleanly brought down before the PEX is put to
> sleep. That way it can cleanly and unambiguously be started up when it
> resumes. No risk of glitching/corrupting the far side with some kind
> of crap on the serial bus.
>
> In any event, the most important required patch here is one that fixes
> socid. It must not turn off the PEX clock. Then we can talk about how
> to fix pci-mvebu to work as a module...
>
> Jason

Jason Gunthorpe April 11, 2014, 5:29 p.m. UTC | #6

On Fri, Apr 11, 2014 at 06:21:08PM +0100, Matthew Minter wrote:

> I would also like to note, my board uses an external clock for the
> PCI ports, thus it is unlikely to be effected by any issues relating
> to clock drivers.

It doesn't matter, the clock in question is the internal divided CPU
synchronous clock that drives the PEX TLP logic. Externally sourcing
the PHY clock won't change the synchronous side.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thomas Petazzoni April 18, 2014, 12:58 p.m. UTC | #7

Neil, Jason,

On Thu, 10 Apr 2014 22:56:00 +0100 (BST), Neil Greatorex wrote:

> I took your attached patch and extended it a bit to print out how long it 
> took. The delays also need to be much longer for me. I also fixed a small 
> typo you made where the bit wasn't being set again to bring the link back 
> up. I've attached the diff to your patch, as well as the combined patch 
> (hope that makes sense).

Unfortunately here your patch doesn't work (and neither does the patch
from Jason Gunthorpe). On Armada 370 DB, without the patch, the e1000e
NIC is detected when earlyprintk is enabled, and not detected when
earlyprintk is disabled. With the patch applied, the e1000e is no
longer detected *at all*, even if earlyprintk is enabled.

Extract from a boot log:

Linux version 3.15.0-rc1-00007-gedf643a-dirty (thomas@skate) (gcc version 4.8.1 (Ubuntu/Linaro 4.8.1-10ubuntu7) ) #317 SMP Fri Apr 18 14:54:13 CEST 2014
[...]
Kernel command line: console=ttyS0,115200 earlyprintk loglevel=8 root=/dev/nfs nfsroot=192.168.1.22:/home/thomas/nfsroot ip=192.168.1.142:192.168.1.22:192.168.1.1:255.255.255.0:devboard:eth0:on
[...]
mvebu-pcie pcie-controller.2: PCIe0.0: performing link reset
mvebu-pcie pcie-controller.2: PCIe0.0: link went down after 20 tries
mvebu-pcie pcie-controller.2: PCIe0.0: link came back up after 100 tries
mvebu-pcie pcie-controller.2: PCI host bridge to bus 0000:00
pci_bus 0000:00: root bus resource [io  0x1000-0xfffff]
pci_bus 0000:00: root bus resource [mem 0xf8000000-0xffdfffff]
pci_bus 0000:00: root bus resource [bus 00-ff]
pci 0000:00:01.0: [11ab:6710] type 01 class 0x060400
pci 0000:00:02.0: [11ab:6710] type 01 class 0x060400
PCI: bus0: Fast back to back transfers disabled
pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
pci 0000:00:02.0: bridge configuration invalid ([bus 00-00]), reconfiguring
PCI: bus1: Fast back to back transfers enabled
pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
PCI: bus2: Fast back to back transfers enabled
pci_bus 0000:02: busn_res: [bus 02-ff] end is updated to 02
pci 0000:00:01.0: PCI bridge to [bus 01]
pci 0000:00:02.0: PCI bridge to [bus 02]
[...]
# /usr/sbin/lspci 
00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
#

Any idea?

Thomas

Thomas Petazzoni April 18, 2014, 1:02 p.m. UTC | #8

Dear Jason Gunthorpe,

On Fri, 11 Apr 2014 10:31:29 -0600, Jason Gunthorpe wrote:

> > Moreover, I am not entirely convinced that a PHY reset is needed here.
> > In my tests, doing just a wait after setting the local dev number was
> > sufficient. The fact is works with earlyprintk also indicates that we
> > don't need to do any specific action with the PHY, just to wait a bit
> > of time. I am worried that resetting the PHY might actually take more
> > time than what is needed, and may have other consequences that we don't
> > necessarily understand at this point.
> 
> I really disagree - clearly the fundamental problem is suspending one
> side of the PEX link while the other remains operating. That isn't
> specified to work and really shouldn't work.
> 
> If you suspend one side of the PEX you *MUST* reset the
> link. Absolutely. No Doubt In My Mind.
> 
> Remember, PCI-E is a serial shared state protocol. The two sides must
> remain in sync or a link reset is required to recover the shared
> state. Halting one side obviously destroys this invariant.
> 
> I think in many cases the reset happens autonomously. The remote side
> will force it to happen. This is what the debugging from Neil shows -
> the link just resets at some inconvenient point. Now we know why.
> 
> The timing sensitivity also makes sense, if you suspend for a very
> short time window the other side might not notice. If you suspend for
> longer the other side will reset the link autonomously and your local
> side will quickly notice the reset once it comes back.
> 
> If you suspend for a little bit the link might not retrain immediately
> but the shared state can become corrupted (eg sequence number
> mismatch) this will eventually trigger a reset, after the local PEX
> has been operating again.
> 
> The LinkUp bit after resume is also clearly a lie - most likely the
> PEX takes some time to detect the change in remote state to trigger a
> link down. After all, it was suspended while the remote was busy
> trying to recover.

Ok, fair enough. However, Neil's patch (which is basically your patch
with longer delays) isn't working here, as I just reported in a
separate e-mail.

So we don't have a solution right now. Do you have another proposal to
try ?

> The fundamental problem here is the clock driver. It should not be
> gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> ensure the link is cleanly brought down before the PEX is put to
> sleep. That way it can cleanly and unambiguously be started up when it
> resumes. No risk of glitching/corrupting the far side with some kind
> of crap on the serial bus.
> 
> In any event, the most important required patch here is one that fixes
> socid. It must not turn off the PEX clock. Then we can talk about how
> to fix pci-mvebu to work as a module...

I don't really understand this: all clocks are gated at boot time, so
regardless of mvebu-soc-id behavior, the PCIe driver should take care
of doing all the necessary initialization without making the
assumptions that the clocks were left turned on from the bootloader
time. This is needed if we want to support pci-mvebu as a module, so I
don't see why hacking mvebu-soc-id is going to solve this.

Thomas

Jason Gunthorpe April 22, 2014, 5:34 p.m. UTC | #9

On Fri, Apr 18, 2014 at 03:02:44PM +0200, Thomas Petazzoni wrote:

> > The LinkUp bit after resume is also clearly a lie - most likely the
> > PEX takes some time to detect the change in remote state to trigger a
> > link down. After all, it was suspended while the remote was busy
> > trying to recover.
> 
> Ok, fair enough. However, Neil's patch (which is basically your patch
> with longer delays) isn't working here, as I just reported in a
> separate e-mail.
> 
> So we don't have a solution right now. Do you have another proposal to
> try ?

Do you have time to see if we can isolate the difference on your
system?

> > The fundamental problem here is the clock driver. It should not be
> > gating PEX clocks so naively. A PEX suspend needs to be sequenced to
> > ensure the link is cleanly brought down before the PEX is put to
> > sleep. That way it can cleanly and unambiguously be started up when it
> > resumes. No risk of glitching/corrupting the far side with some kind
> > of crap on the serial bus.
> > 
> > In any event, the most important required patch here is one that fixes
> > socid. It must not turn off the PEX clock. Then we can talk about how
> > to fix pci-mvebu to work as a module...
 
> I don't really understand this: all clocks are gated at boot time, so
> regardless of mvebu-soc-id behavior, the PCIe driver should take care
> of doing all the necessary initialization without making the
> assumptions that the clocks were left turned on from the bootloader

The current mvebu-soc-id makes it impossible to do a clean hand off of
the boot loader PEX setup to the PEX driver. I think that is a
problem. Certainly if we fix it non-modular PEX will start working
reliably.

Keep in mind the current driver cannot startup a PEX without
bootloader support. It does not clear the set-at-reset
Conf_TrainingDisable bit, and it doesn't wait for a link to come up
after doing so.

> time. This is needed if we want to support pci-mvebu as a module, so I
> don't see why hacking mvebu-soc-id is going to solve this.

I agree we should try to figure the modular case out, but it looks to
me that suspending the PEX is a bigger job that just gating the
clock..

The automatic gating of the PEX clocks should be more clever. If there
are no DT elements that reference the clock it can be disabled,
otherwise we should probably leave it enabled and rely on the PEX
driver to do power management once it gets loaded.

So broadly, my thinking is the following largely orthogonal items:
 1) PEX's that are setup by the bootloader must be cleanly handed off
    to the driver. The clocks must never gate.
 2) The driver should learn to turn on a PEX from either the
    Conf_TrainingDisable=1 state or the clock gated state
 3) PEX clocks that are never used in DT can be safely shutoff,
    otherwise they must remain on
 4) The PEX driver can learn to properly suspend the PEX for power
    savings, via the normal Linux power management stuff.

What do you think?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Gunthorpe April 22, 2014, 5:56 p.m. UTC | #10

> mvebu-pcie pcie-controller.2: PCIe0.0: performing link reset
> mvebu-pcie pcie-controller.2: PCIe0.0: link went down after 20 tries
> mvebu-pcie pcie-controller.2: PCIe0.0: link came back up after 100 tries

So the '100' here says the timeout expired. My first stab would be to
set the timeout longer.. Maybe your chip takes longer to train the
link.

You also mentioned that setting Conf_TrainingDisable doesn't cause the
link to go down? That would be useful to verify (in a clean
environment without clock gating weirdness..)

> # /usr/sbin/lspci 
> 00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 6710
> 00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 6710

I'm guessing:

echo 1 > /sys/bus/pci/rescan

Will make it appear?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fixing PCIe issues on Armada XP

Commit Message

Comments

Patch