diff mbox

[BUG] FL1009: xHCI host not responding to stop endpoint command.

Message ID 20140121211741.GA12657@xanatos (mailing list archive)
State New, archived
Headers show

Commit Message

Sarah Sharp Jan. 21, 2014, 9:17 p.m. UTC
On Sat, Jan 18, 2014 at 10:49:17PM +0100, Arnaud Ebalard wrote:
> Hi,
> 
> I have added Thomas in the recipients, because I guess he may be of some
> help debugging the issue further. Thomas, the beginning of the thread is
> here: http://thread.gmane.org/gmane.linux.usb.general/101531

...

> I started suspecting the introduction of MSI support in Marvell PCIe
> host controller driver (FL1009 is on the PCIe bus) and compiled a
> a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
> previous tests): I did not manage to reproduce the issue with this
> kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
> 627dfcc249e2 are
> 
> ATM, I do not know if the problem is related to a bug in introduced MSI
> support or some weird incompatibility of that functionality with the
> FL1009 which would require some quirk in XHCI stack.

We've actually had issues in the past with Fresco Logic hosts not
supporting MSI properly, even though the PCI devices claim to have MSI
support.  So turning off CONFIG_PCI_MSI may actually mean the Fresco
Logic host is to blame, rather than the Marvell patches.  I assume MSI
wouldn't have been turned on for the Fresco Logic host unless the parent
PCI host controller supported it.

Let's see if the Fresco Logic host is really the root cause.  Please
apply the this patch to 3.13.0-rc8 and recompile with CONFIG_PCI_MSI
enabled:


Sarah Sharp

Comments

Arnaud Ebalard Jan. 22, 2014, 10:23 p.m. UTC | #1
Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

> On Sat, Jan 18, 2014 at 10:49:17PM +0100, Arnaud Ebalard wrote:
>> Hi,
>> 
>> I have added Thomas in the recipients, because I guess he may be of some
>> help debugging the issue further. Thomas, the beginning of the thread is
>> here: http://thread.gmane.org/gmane.linux.usb.general/101531
>
> ...
>
>> I started suspecting the introduction of MSI support in Marvell PCIe
>> host controller driver (FL1009 is on the PCIe bus) and compiled a
>> a 3.13.0-rc8 w/ CONFIG_PCI_MSI disabled (it was enabled in all my
>> previous tests): I did not manage to reproduce the issue with this
>> kernel. As a side note, commits 5b4deb6526bd, 31f614edb726 and
>> 627dfcc249e2 are
>> 
>> ATM, I do not know if the problem is related to a bug in introduced MSI
>> support or some weird incompatibility of that functionality with the
>> FL1009 which would require some quirk in XHCI stack.
>
> We've actually had issues in the past with Fresco Logic hosts not
> supporting MSI properly, even though the PCI devices claim to have MSI
> support.  So turning off CONFIG_PCI_MSI may actually mean the Fresco
> Logic host is to blame, rather than the Marvell patches.  I assume MSI
> wouldn't have been turned on for the Fresco Logic host unless the parent
> PCI host controller supported it.
>
> Let's see if the Fresco Logic host is really the root cause.  Please
> apply the this patch to 3.13.0-rc8 and recompile with CONFIG_PCI_MSI
> enabled:
>
> diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
> index 6c03584ac15f..74748444c040 100644
> --- a/drivers/usb/host/xhci-pci.c
> +++ b/drivers/usb/host/xhci-pci.c
> @@ -30,6 +30,7 @@
>  /* Device for a quirk */
>  #define PCI_VENDOR_ID_FRESCO_LOGIC	0x1b73
>  #define PCI_DEVICE_ID_FRESCO_LOGIC_PDK	0x1000
> +#define PCI_DEVICE_ID_FRESCO_LOGIC_FL1009	0x1009
>  #define PCI_DEVICE_ID_FRESCO_LOGIC_FL1400	0x1400
>  
>  #define PCI_VENDOR_ID_ETRON		0x1b6f
> @@ -63,6 +64,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
>  
>  	/* Look for vendor-specific quirks */
>  	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
> +			pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1009)
> +		xhci->quirks |= XHCI_BROKEN_MSI;
> +	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
>  			(pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK ||
>  			 pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1400)) {
>  		if (pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK &&

With the patch applied on top of 3.13.0 kernel recompiled w/
CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
you can add my:

 Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>

Since you'll have to push the patch to -stable team at least for 3.13,
I wonder if it would not make sense to extend that at least to 3.12.
and possibly 3.10 (3.2 is still widely used but I wonder if it makes
sense to go that far).

Cheers,

a+
Jason Cooper Jan. 22, 2014, 10:26 p.m. UTC | #2
On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
> With the patch applied on top of 3.13.0 kernel recompiled w/
> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
> you can add my:
> 
>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
> 
> Since you'll have to push the patch to -stable team at least for 3.13,
> I wonder if it would not make sense to extend that at least to 3.12.
> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
> sense to go that far).

Can you pinpoint the commit that introduced the regression?

thx,

Jason.
Arnaud Ebalard Jan. 22, 2014, 10:43 p.m. UTC | #3
Hi Jason,

Jason Cooper <jason@lakedaemon.net> writes:

> On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
>> With the patch applied on top of 3.13.0 kernel recompiled w/
>> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
>> you can add my:
>> 
>>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
>> 
>> Since you'll have to push the patch to -stable team at least for 3.13,
>> I wonder if it would not make sense to extend that at least to 3.12.
>> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
>> sense to go that far).
>
> Can you pinpoint the commit that introduced the regression?

f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."

Technically, this is not per se the commit which introduced the
regression but the one that *partially* fixed it by introducing the XHCI
quirk to skip MSI enabling for Fresco Logic chips. The thing is it
should have included the FL1009 in the targets. Sarah, can you confirm
this?

Jason, the logic is summarized here, AFAICT:

commit 455f58925247e8a1a1941e159f3636ad6ee4c90b
Author: Oliver Neukum <oneukum@suse.de>
Date:   Mon Sep 30 15:50:54 2013 +0200

    xhci: quirk for extra long delay for S4
    
    It has been reported that this chipset really cannot
    sleep without this extraordinary delay.
    
    This patch should be backported, in order to ensure this host functions
    under stable kernels.  The last quirk for Fresco Logic hosts (commit
    bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746 "xhci: Extend Fresco Logic MSI
    quirk.") was backported to stable kernels as old as 2.6.36.
    
    Signed-off-by: Oliver Neukum <oneukum@suse.de>
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Cc: stable@vger.kernel.org


commit bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Wed Oct 17 13:44:06 2012 -0700

    xhci: Extend Fresco Logic MSI quirk.
    
    Ali reports that plugging a device into the Fresco Logic xHCI host with
    PCI device ID 1400 produces an IRQ error:
    
     do_IRQ: 3.176 No irq handler for vector (irq -1)
    
    Other early Fresco Logic host revisions don't support MSI, even though
    their PCI config space claims they do.  Extend the quirk to disabling
    MSI to this chipset revision.  Also enable the short transfer quirk,
    since it's likely this revision also has that quirk, and it should be
    harmless to enable.
    
    <SNIP>

    This patch should be backported to stable kernels as old as 2.6.36, that
    contain the commit f5182b4155b9d686c5540a6822486400e34ddd98 "xhci:
    Disable MSI for some Fresco Logic hosts."
    
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Reported-by: A Sh <smr.ash1991@gmail.com>
    Tested-by: A Sh <smr.ash1991@gmail.com>
    Cc: stable@vger.kernel.org


commit f5182b4155b9d686c5540a6822486400e34ddd98
Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Date:   Thu Jun 2 11:33:02 2011 -0700

    xhci: Disable MSI for some Fresco Logic hosts.
    
    Some Fresco Logic hosts, including those found in the AUAU N533V laptop,
    advertise MSI, but fail to actually generate MSI interrupts.  Add a new
    xHCI quirk to skip MSI enabling for the Fresco Logic host controllers.
    Fresco Logic confirms that all chips with PCI vendor ID 0x1b73 and device
    ID 0x1000, regardless of PCI revision ID, do not support MSI.
    
    This should be backported to stable kernels as far back as 2.6.36, which
    was the first kernel to support MSI on xHCI hosts.
    
    Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
    Reported-by: Sergey Galanov <sergey.e.galanov@gmail.com>
    Cc: stable@kernel.org
Sarah Sharp Jan. 22, 2014, 11:56 p.m. UTC | #4
On Wed, Jan 22, 2014 at 11:43:16PM +0100, Arnaud Ebalard wrote:
> Hi Jason,
> 
> Jason Cooper <jason@lakedaemon.net> writes:
> 
> > On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
> >> With the patch applied on top of 3.13.0 kernel recompiled w/
> >> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
> >> you can add my:
> >> 
> >>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
> >> 
> >> Since you'll have to push the patch to -stable team at least for 3.13,
> >> I wonder if it would not make sense to extend that at least to 3.12.
> >> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
> >> sense to go that far).
> >
> > Can you pinpoint the commit that introduced the regression?
> 
> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
> 
> Technically, this is not per se the commit which introduced the
> regression but the one that *partially* fixed it by introducing the XHCI
> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
> should have included the FL1009 in the targets. Sarah, can you confirm
> this?

I don't know if it should have included FL1009, it was just a guess,
based on the fact that the 0x1000 and 0x1400 devices did need MSI
disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
not sure if/when I'll get a response back.

That still doesn't necessarily rule out MSI issues in the Marvell PCI
host controller code.  Can you attach another PCI device with MSI
support under the host and see if it works?

Sarah Sharp

> Jason, the logic is summarized here, AFAICT:
> 
> commit 455f58925247e8a1a1941e159f3636ad6ee4c90b
> Author: Oliver Neukum <oneukum@suse.de>
> Date:   Mon Sep 30 15:50:54 2013 +0200
> 
>     xhci: quirk for extra long delay for S4
>     
>     It has been reported that this chipset really cannot
>     sleep without this extraordinary delay.
>     
>     This patch should be backported, in order to ensure this host functions
>     under stable kernels.  The last quirk for Fresco Logic hosts (commit
>     bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746 "xhci: Extend Fresco Logic MSI
>     quirk.") was backported to stable kernels as old as 2.6.36.
>     
>     Signed-off-by: Oliver Neukum <oneukum@suse.de>
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
>     Cc: stable@vger.kernel.org
> 
> 
> commit bba18e33f25072ebf70fd8f7f0cdbf8cdb59a746
> Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
> Date:   Wed Oct 17 13:44:06 2012 -0700
> 
>     xhci: Extend Fresco Logic MSI quirk.
>     
>     Ali reports that plugging a device into the Fresco Logic xHCI host with
>     PCI device ID 1400 produces an IRQ error:
>     
>      do_IRQ: 3.176 No irq handler for vector (irq -1)
>     
>     Other early Fresco Logic host revisions don't support MSI, even though
>     their PCI config space claims they do.  Extend the quirk to disabling
>     MSI to this chipset revision.  Also enable the short transfer quirk,
>     since it's likely this revision also has that quirk, and it should be
>     harmless to enable.
>     
>     <SNIP>
> 
>     This patch should be backported to stable kernels as old as 2.6.36, that
>     contain the commit f5182b4155b9d686c5540a6822486400e34ddd98 "xhci:
>     Disable MSI for some Fresco Logic hosts."
>     
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
>     Reported-by: A Sh <smr.ash1991@gmail.com>
>     Tested-by: A Sh <smr.ash1991@gmail.com>
>     Cc: stable@vger.kernel.org
> 
> 
> commit f5182b4155b9d686c5540a6822486400e34ddd98
> Author: Sarah Sharp <sarah.a.sharp@linux.intel.com>
> Date:   Thu Jun 2 11:33:02 2011 -0700
> 
>     xhci: Disable MSI for some Fresco Logic hosts.
>     
>     Some Fresco Logic hosts, including those found in the AUAU N533V laptop,
>     advertise MSI, but fail to actually generate MSI interrupts.  Add a new
>     xHCI quirk to skip MSI enabling for the Fresco Logic host controllers.
>     Fresco Logic confirms that all chips with PCI vendor ID 0x1b73 and device
>     ID 0x1000, regardless of PCI revision ID, do not support MSI.
>     
>     This should be backported to stable kernels as far back as 2.6.36, which
>     was the first kernel to support MSI on xHCI hosts.
>     
>     Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
>     Reported-by: Sergey Galanov <sergey.e.galanov@gmail.com>
>     Cc: stable@kernel.org
Arnaud Ebalard Jan. 23, 2014, 8:24 a.m. UTC | #5
Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

>> > Can you pinpoint the commit that introduced the regression?
>> 
>> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
>> 
>> Technically, this is not per se the commit which introduced the
>> regression but the one that *partially* fixed it by introducing the XHCI
>> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
>> should have included the FL1009 in the targets. Sarah, can you confirm
>> this?
>
> I don't know if it should have included FL1009, it was just a guess,
> based on the fact that the 0x1000 and 0x1400 devices did need MSI
> disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> not sure if/when I'll get a response back.
>
> That still doesn't necessarily rule out MSI issues in the Marvell PCI
> host controller code.  Can you attach another PCI device with MSI
> support under the host and see if it works?

The various Armada-based devices I have are NAS which do not have PCIe
slots to plug additional devices (everything is soldered). I don't know
which device Thomas used for its tests. Just in case, I also added Willy
in CC: who have various boards and may also have done more test with
additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.

Cheers,

a+
Willy Tarreau Jan. 23, 2014, 11:09 a.m. UTC | #6
On Thu, Jan 23, 2014 at 09:24:41AM +0100, Arnaud Ebalard wrote:
> Hi Sarah,
> 
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> 
> >> > Can you pinpoint the commit that introduced the regression?
> >> 
> >> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
> >> 
> >> Technically, this is not per se the commit which introduced the
> >> regression but the one that *partially* fixed it by introducing the XHCI
> >> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
> >> should have included the FL1009 in the targets. Sarah, can you confirm
> >> this?
> >
> > I don't know if it should have included FL1009, it was just a guess,
> > based on the fact that the 0x1000 and 0x1400 devices did need MSI
> > disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> > not sure if/when I'll get a response back.
> >
> > That still doesn't necessarily rule out MSI issues in the Marvell PCI
> > host controller code.  Can you attach another PCI device with MSI
> > support under the host and see if it works?
> 
> The various Armada-based devices I have are NAS which do not have PCIe
> slots to plug additional devices (everything is soldered). I don't know
> which device Thomas used for its tests. Just in case, I also added Willy
> in CC: who have various boards and may also have done more test with
> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.

I've been running an intel i350 dual-port NIC (igb driver) supporting
MSI on the mirabox, and it used to work in 3.10+many of the patches
coming from the Free-electrons team. Some recent changes to the PCI
code introduced a regression preventing this driver from correctly
registering an MSI interrupt, and I did not have enough time to
investigate it deep enough to fix it. That said, I know how to hack
it to work again, so if it can be of any use, I can run a test on
the mirabox (armada370) and on the XPGP board (armadaXP).

Willy
Thomas Petazzoni Jan. 26, 2014, 1:30 p.m. UTC | #7
Dear Arnaud Ebalard,

On Thu, 23 Jan 2014 09:24:41 +0100, Arnaud Ebalard wrote:

> The various Armada-based devices I have are NAS which do not have PCIe
> slots to plug additional devices (everything is soldered). I don't know
> which device Thomas used for its tests. Just in case, I also added Willy
> in CC: who have various boards and may also have done more test with
> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.

The device I've used to test MSI is a e1000e PCIe Intel network card.
It uses one MSI interrupt, so admittedly, the MSI testing is quite
limited for now.

Thomas
Arnaud Ebalard Jan. 27, 2014, 6:36 p.m. UTC | #8
Hi Thomas and Sarah,

Thomas Petazzoni <thomas.petazzoni@free-electrons.com> writes:

> On Thu, 23 Jan 2014 09:24:41 +0100, Arnaud Ebalard wrote:
>
>> The various Armada-based devices I have are NAS which do not have PCIe
>> slots to plug additional devices (everything is soldered). I don't know
>> which device Thomas used for its tests. Just in case, I also added Willy
>> in CC: who have various boards and may also have done more test with
>> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.
>
> The device I've used to test MSI is a e1000e PCIe Intel network card.
> It uses one MSI interrupt, so admittedly, the MSI testing is quite
> limited for now.

I had a second thought this WE about a previous question from Sarah: my
platforms do not have a PCIe extension slots to test other devices but
the RN102 does have an additional device connected on the PCIe bus: a
Marvell 88SE9170 SATA Controller. I have put below the output of lspci
-vvv (on a 3.13.0-rc8 kernel w/ CONFIG_PCI_MSI enabled) in case you can
spot something obviously wrong in it:

00:01.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00010000-00010fff
	Memory behind bridge: e0000000-e00fffff
	Prefetchable memory behind bridge: 00000000-000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:02.0 PCI bridge: Marvell Technology Group Ltd. Device 7846 (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
	I/O behind bridge: 0000f000-00000fff
	Memory behind bridge: e0100000-e01fffff
	Prefetchable memory behind bridge: 00000000-000fffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

01:00.0 SATA controller: Marvell Technology Group Ltd. Device 9170 (rev 12) (prog-if 01 [AHCI 1.0])
	Subsystem: Marvell Technology Group Ltd. Device 9170
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 105
	Region 0: I/O ports at 10010 [size=8]
	Region 1: I/O ports at 10020 [size=4]
	Region 2: I/O ports at 10018 [size=8]
	Region 3: I/O ports at 10024 [size=4]
	Region 4: I/O ports at 10000 [size=16]
	Region 5: Memory at e0010000 (32-bit, non-prefetchable) [size=512]
	Expansion ROM at e0000000 [disabled] [size=64K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: d0020a04  Data: 0f10
	Capabilities: [70] Express (v2) Legacy Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <512ns, L1 <64us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
	Kernel driver in use: ahci

02:00.0 USB controller: Fresco Logic FL1009 USB 3.0 Host Controller (rev 02) (prog-if 30 [XHCI])
	Subsystem: Fresco Logic Device 0000
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 104
	Region 0: Memory at e0100000 (64-bit, non-prefetchable) [size=64K]
	Region 2: Memory at e0110000 (64-bit, non-prefetchable) [size=4K]
	Region 4: Memory at e0111000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2- AuxCurrent=375mA PME(D0+,D1+,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [70] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
		LnkCap:	Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s unlimited, L1 unlimited
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
		LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Capabilities: [b0] MSI-X: Enable+ Count=8 Masked-
		Vector table: BAR=2 offset=00000000
		PBA: BAR=4 offset=00000000
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
		AERCap:	First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
	Kernel driver in use: xhci_hcd
Arnaud Ebalard Jan. 27, 2014, 10:20 p.m. UTC | #9
Hi Willy,

Willy Tarreau <w@1wt.eu> writes:

>> > I don't know if it should have included FL1009, it was just a guess,
>> > based on the fact that the 0x1000 and 0x1400 devices did need MSI
>> > disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
>> > not sure if/when I'll get a response back.
>> >
>> > That still doesn't necessarily rule out MSI issues in the Marvell PCI
>> > host controller code.  Can you attach another PCI device with MSI
>> > support under the host and see if it works?
>> 
>> The various Armada-based devices I have are NAS which do not have PCIe
>> slots to plug additional devices (everything is soldered). I don't know
>> which device Thomas used for its tests. Just in case, I also added Willy
>> in CC: who have various boards and may also have done more test with
>> additional PCIe devices and CONFIG_PCI_MSI enabled on 3.13 kernel.
>
> I've been running an intel i350 dual-port NIC (igb driver) supporting
> MSI on the mirabox, and it used to work in 3.10+many of the patches
> coming from the Free-electrons team. Some recent changes to the PCI
> code introduced a regression preventing this driver from correctly
> registering an MSI interrupt, and I did not have enough time to
> investigate it deep enough to fix it. That said, I know how to hack
> it to work again, so if it can be of any use, I can run a test on
> the mirabox (armada370) and on the XPGP board (armadaXP).

Thanks for the proposal, Willy. I guess Thomas can tell better than me
what kind of tests would help ruling out a problem in MSI support and
put the blame on FL chip ;-) Thomas, if you need me to test something on
some of my platforms, do not hesitate.

Cheers,

a+
Arnaud Ebalard Feb. 10, 2014, 6:57 p.m. UTC | #10
Hi Sarah,

Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:

> On Wed, Jan 22, 2014 at 11:43:16PM +0100, Arnaud Ebalard wrote:
>> Hi Jason,
>> 
>> Jason Cooper <jason@lakedaemon.net> writes:
>> 
>> > On Wed, Jan 22, 2014 at 11:23:23PM +0100, Arnaud Ebalard wrote:
>> >> With the patch applied on top of 3.13.0 kernel recompiled w/
>> >> CONFIG_PCI_MSI enabled, I cannot reproduce the bug. I guess
>> >> you can add my:
>> >> 
>> >>  Reported-and-tested-By: Arnaud Ebalard <arno@natisbad.org>
>> >> 
>> >> Since you'll have to push the patch to -stable team at least for 3.13,
>> >> I wonder if it would not make sense to extend that at least to 3.12.
>> >> and possibly 3.10 (3.2 is still widely used but I wonder if it makes
>> >> sense to go that far).
>> >
>> > Can you pinpoint the commit that introduced the regression?
>> 
>> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
>> 
>> Technically, this is not per se the commit which introduced the
>> regression but the one that *partially* fixed it by introducing the XHCI
>> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
>> should have included the FL1009 in the targets. Sarah, can you confirm
>> this?
>
> I don't know if it should have included FL1009, it was just a guess,
> based on the fact that the 0x1000 and 0x1400 devices did need MSI
> disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> not sure if/when I'll get a response back.
>
> That still doesn't necessarily rule out MSI issues in the Marvell PCI
> host controller code.  Can you attach another PCI device with MSI
> support under the host and see if it works?

Unless you have some objections or some positive feedback from Fresco
Logic people, can you queue your quirks for FL1009 for 3.14-rc* and
-stable? Note that I am just asking, i.e. if you want to wait a bit
more, I am not that in a hurry.

Cheers,

a+
Sarah Sharp Feb. 14, 2014, 12:09 a.m. UTC | #11
On Mon, Feb 10, 2014 at 07:57:42PM +0100, Arnaud Ebalard wrote:
> Sarah Sharp <sarah.a.sharp@linux.intel.com> writes:
> 
> > On Wed, Jan 22, 2014 at 11:43:16PM +0100, Arnaud Ebalard wrote:
> >> f5182b4155b9d686c5540a6822486400e34ddd98 "xhci: Disable MSI for some Fresco Logic hosts."
> >> 
> >> Technically, this is not per se the commit which introduced the
> >> regression but the one that *partially* fixed it by introducing the XHCI
> >> quirk to skip MSI enabling for Fresco Logic chips. The thing is it
> >> should have included the FL1009 in the targets. Sarah, can you confirm
> >> this?
> >
> > I don't know if it should have included FL1009, it was just a guess,
> > based on the fact that the 0x1000 and 0x1400 devices did need MSI
> > disabled.  I can attempt to ask the Fresco Logic folks I know, but I'm
> > not sure if/when I'll get a response back.
> >
> > That still doesn't necessarily rule out MSI issues in the Marvell PCI
> > host controller code.  Can you attach another PCI device with MSI
> > support under the host and see if it works?
> 
> Unless you have some objections or some positive feedback from Fresco
> Logic people, can you queue your quirks for FL1009 for 3.14-rc* and
> -stable? Note that I am just asking, i.e. if you want to wait a bit
> more, I am not that in a hurry.

Sorry for not getting back to you sooner.  The Fresco Logic folks said
that the FL1000 and FL1400 hosts are actually the same chipset, and it
doesn't support MSI.  However, they say the FL1009 *should* support MSI.

So that doesn't rule out issues with the Marvell PCI MSI code.  I
suspect that's actually the root cause, since I haven't gotten any bug
reports that the FL1009 doesn't work with MSI enabled on other systems.

Sarah Sharp
Thomas Petazzoni Feb. 14, 2014, 8:26 a.m. UTC | #12
Sarah, Arnaud,

On Thu, 13 Feb 2014 16:09:10 -0800, Sarah Sharp wrote:

> > Unless you have some objections or some positive feedback from Fresco
> > Logic people, can you queue your quirks for FL1009 for 3.14-rc* and
> > -stable? Note that I am just asking, i.e. if you want to wait a bit
> > more, I am not that in a hurry.
> 
> Sorry for not getting back to you sooner.  The Fresco Logic folks said
> that the FL1000 and FL1400 hosts are actually the same chipset, and it
> doesn't support MSI.  However, they say the FL1009 *should* support MSI.
> 
> So that doesn't rule out issues with the Marvell PCI MSI code.  I
> suspect that's actually the root cause, since I haven't gotten any bug
> reports that the FL1009 doesn't work with MSI enabled on other systems.

Ok, I'll try to have a look into this, by re-reading the entire
thread, and trying to propose some patches that add debugging details
in the Marvell PCI MSI code to try to understand what's going on.

Thanks!

Thomas
diff mbox

Patch

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 6c03584ac15f..74748444c040 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -30,6 +30,7 @@ 
 /* Device for a quirk */
 #define PCI_VENDOR_ID_FRESCO_LOGIC	0x1b73
 #define PCI_DEVICE_ID_FRESCO_LOGIC_PDK	0x1000
+#define PCI_DEVICE_ID_FRESCO_LOGIC_FL1009	0x1009
 #define PCI_DEVICE_ID_FRESCO_LOGIC_FL1400	0x1400
 
 #define PCI_VENDOR_ID_ETRON		0x1b6f
@@ -63,6 +64,9 @@  static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
 
 	/* Look for vendor-specific quirks */
 	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
+			pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1009)
+		xhci->quirks |= XHCI_BROKEN_MSI;
+	if (pdev->vendor == PCI_VENDOR_ID_FRESCO_LOGIC &&
 			(pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK ||
 			 pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_FL1400)) {
 		if (pdev->device == PCI_DEVICE_ID_FRESCO_LOGIC_PDK &&