Message ID | 599D3410.9050504@intel.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On 23/08/2017 09:51, Mathias Nyman wrote: > On 23.08.2017 09:07, Felipe Balbi wrote: > >> Mason writes: >> >>> Any idea what could have changed between 4.9 and 4.13 ? >> >> Quite a bit: >> >> $ git rev-list --no-merges --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/ >> 58 > > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. > > Could you add something like the below to check which what is killing the host? > Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index 51cd4b8..ade2ad6 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci) > if (xhci->xhc_state & XHCI_STATE_DYING) > return; > > - xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); > + xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n", > + __builtin_return_address(0)); > xhci->xhc_state |= XHCI_STATE_DYING; > > xhci_cleanup_command_queue(xhci); I'll try some coarse bisection to narrow it down. $ git describe --contains d9f11ba9f107aa335091ab8d7ba5eea714e46e8b v4.12-rc1~97^2~39 I'll check 4.11 first. I wanted to mention that the XHCI setup on 4.9 and 4.13 print slightly different things (at the beginning). On 4.9 [ 1.240322] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.245617] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.258691] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 1.268090] hub 1-0:1.0: USB hub found [ 1.271905] hub 1-0:1.0: 4 ports detected [ 1.276372] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.281645] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.289173] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.297775] hub 2-0:1.0: USB hub found [ 1.301577] hub 2-0:1.0: 4 ports detected [ 1.306194] usbcore: registered new interface driver usb-storage On 4.13 [ 1.222471] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 1.229156] xhci_hcd 0000:01:00.0: Resetting [ 2.268836] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 2.274126] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 2.287222] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 2.296653] hub 1-0:1.0: USB hub found [ 2.300478] hub 1-0:1.0: 4 ports detected [ 2.304962] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 2.310246] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 2.317776] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 2.326419] hub 2-0:1.0: USB hub found [ 2.330229] hub 2-0:1.0: 4 ports detected [ 2.334869] usbcore: registered new interface driver usb-storage FWIW, "of_irq_parse_pci: failed with rc=-22" seems to come from: [ 1.257411] [<c03d80c8>] (of_irq_parse_pci) from [<c03d8270>] (of_irq_parse_and_map_pci+0x10/0x2c) [ 1.266420] [<c03d8270>] (of_irq_parse_and_map_pci) from [<c03100a8>] (pci_assign_irq+0x78/0xb0) [ 1.275254] [<c03100a8>] (pci_assign_irq) from [<c030a1c8>] (pci_device_probe+0x18/0x128) [ 1.283476] [<c030a1c8>] (pci_device_probe) from [<c0357864>] (driver_probe_device+0x244/0x2c8) The error logging was added by f1aa54840657f No, that just turned one specific error into a warning. Need to dig a bit more. Regards.
On 23/08/2017 09:51, Mathias Nyman wrote: > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. > > Could you add something like the below to check which what is killing the host? > Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected [ 46.571934] scsi host0: usb-storage 2-2:1.0 [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 47.639637] sda: sda1 [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 [ 58.115976] Hardware name: Sigma Tango DT [ 58.120016] Workqueue: usb_hub_wq hub_event [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 58.243391] usb 2-2: USB disconnect, device number 2
On 23/08/2017 09:51, Mathias Nyman wrote: > very likely cause is the more aggressive detection of pci removed xhci hosts > > See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b > xhci: Rework how we handle unresponsive or hoptlug removed hosts > > It checks if a xhci register reads returns 0xffffffff and assumes xhci > died in that case. I've just tested 4.11.12 + a few local patches to back-port PCIe host bridge support. It "works" as well as 4.9 (i.e. modulo the "AER: Uncorrected (Non-Fatal) error received") [ 0.508533] pcie_tango 50000000.pcie: simultaneous PCI config and MMIO accesses may cause data corruption [ 0.519622] OF: PCI: host bridge /soc/pcie@2e000 ranges: [ 0.519645] OF: PCI: MEM 0x50400000..0x53ffffff -> 0x00400000 [ 0.519725] pcie_tango 50000000.pcie: ECAM at [mem 0x50000000-0x503fffff] for [bus 00-03] [ 0.519872] pcie_tango 50000000.pcie: PCI host bridge to bus 0000:00 [ 0.519886] pci_bus 0000:00: root bus resource [bus 00-03] [ 0.519898] pci_bus 0000:00: root bus resource [mem 0x50400000-0x53ffffff] (bus address [0x00400000-0x03ffffff]) [ 0.520201] PCI: bus0: Fast back to back transfers disabled [ 0.520213] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring [ 0.520922] PCI: bus1: Fast back to back transfers disabled [ 0.520964] pci 0000:00:00.0: of_irq_parse_pci: failed with rc=-22 [ 0.520993] pci 0000:00:00.0: BAR 8: assigned [mem 0x50400000-0x504fffff] [ 0.521004] pci 0000:01:00.0: BAR 0: assigned [mem 0x50400000-0x50401fff 64bit] [ 0.521025] pci 0000:00:00.0: PCI bridge to [bus 01] [ 0.521033] pci 0000:00:00.0: bridge window [mem 0x50400000-0x504fffff] [ 0.521085] pcieport 0000:00:00.0: enabling device (0140 -> 0142) [ 0.521282] pcieport 0000:00:00.0: Signaling PME with IRQ 30 [ 0.521402] pcieport 0000:00:00.0: AER enabled with IRQ 30 [ 0.521526] pci 0000:01:00.0: enabling device (0140 -> 0142) ... [ 1.239706] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.244998] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.258048] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 1.267467] hub 1-0:1.0: USB hub found [ 1.271287] hub 1-0:1.0: 4 ports detected [ 1.275761] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.281048] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.288578] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 1.297234] hub 2-0:1.0: USB hub found [ 1.301042] hub 2-0:1.0: 4 ports detected [ 1.305681] usbcore: registered new interface driver usb-storage PLUG #1 [ 26.104607] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 26.143799] usb-storage 2-2:1.0: USB Mass Storage device detected [ 26.150253] scsi host0: usb-storage 2-2:1.0 [ 27.177298] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 27.187586] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 27.199000] sd 0:0:0:0: [sda] Write Protect is off [ 27.204186] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 27.216322] sda: sda1 [ 27.220584] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 27.252046] random: fast init done UNPLUG #1 [ 37.334040] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 37.342135] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 37.353970] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 37.362589] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 37.369485] pcieport 0000:00:00.0: AER: Device recovery failed [ 38.066538] xhci_hcd 0000:01:00.0: Cannot set link state. [ 38.072039] usb usb2-port2: cannot disable (err = -32) [ 38.077348] usb 2-2: USB disconnect, device number 2 [ 38.082711] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 38.094279] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 38.108006] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 38.116878] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 38.123954] pcieport 0000:00:00.0: AER: Device recovery failed PLUG #2 [ 55.097922] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ 55.137590] usb-storage 2-2:1.0: USB Mass Storage device detected [ 55.144016] scsi host0: usb-storage 2-2:1.0 [ 56.163907] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 56.174851] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 56.184218] sd 0:0:0:0: [sda] Write Protect is off [ 56.190162] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 56.202117] sda: sda1 [ 56.207112] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #2 [ 63.228310] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 63.236403] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 63.248220] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 63.256653] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 63.263523] pcieport 0000:00:00.0: AER: Device recovery failed [ 63.959768] xhci_hcd 0000:01:00.0: Cannot set link state. [ 63.965227] usb usb2-port2: cannot disable (err = -32) [ 63.970409] usb 2-2: USB disconnect, device number 3 [ 63.975664] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 63.987356] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 64.000021] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 64.008655] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 64.015553] pcieport 0000:00:00.0: AER: Device recovery failed [ 64.021449] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 64.029580] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 64.041410] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 64.049818] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 64.056658] pcieport 0000:00:00.0: AER: Device recovery failed Bjorn, What do you make of the AER logs? What can I do to debug this issue? Regards. FWIW, verbose lspci output below. # lspci -vv 00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin ? routed to IRQ 30 Region 0: Memory at <ignored> (64-bit, non-prefetchable) Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 I/O behind bridge: 00000000-00000fff Memory behind bridge: 00400000-004fffff Prefetchable memory behind bridge: 00000000-000fffff Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR- BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+ Address: 00000000a002e07c Data: 0000 Capabilities: [78] Power Management version 3 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=3 PME- Capabilities: [80] Express (v2) Root Port (Slot-), MSI 03 DevCap: MaxPayload 256 bytes, PhantFunc 0 ExtTag- RBE+ DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+ RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq- AuxPwr- TransPend+ LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp- LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID 0000, PMEStatus- PMEPending- DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd- DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd- LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Virtual Channel Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 Arb: Fixed- WRR32- WRR64- WRR128- Ctrl: ArbSelect=Fixed Status: InProgress- VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff Status: NegoPending- InProgress- Capabilities: [800 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 0e, GenCap- CGenEn- ChkCap- ChkEn- Kernel driver in use: pcieport 01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0, Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 0 Region 0: Memory at 50400000 (64-bit, non-prefetchable) [size=8K] Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [90] MSI-X: Enable+ Count=8 Masked- Vector table: BAR=0 offset=00001000 PBA: BAR=0 offset=00001080 Capabilities: [a0] Express (v2) Endpoint, MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 512 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis- Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- Compliance De-emphasis: -6dB LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- Capabilities: [100 v1] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- Capabilities: [150 v1] Latency Tolerance Reporting Max snoop latency: 0ns Max no snoop latency: 0ns Kernel driver in use: xhci_hcd
On 23.08.2017 12:31, Mason wrote: > On 23/08/2017 09:51, Mathias Nyman wrote: > >> very likely cause is the more aggressive detection of pci removed xhci hosts >> >> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b >> xhci: Rework how we handle unresponsive or hoptlug removed hosts >> >> It checks if a xhci register reads returns 0xffffffff and assumes xhci >> died in that case. >> >> Could you add something like the below to check which what is killing the host? >> Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it. > > [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd > [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected > [ 46.571934] scsi host0: usb-storage 2-2:1.0 > [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 > [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) > [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off > [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA > [ 47.639637] sda: sda1 > [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk > [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead > [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 > [ 58.115976] Hardware name: Sigma Tango DT > [ 58.120016] Workqueue: usb_hub_wq hub_event > [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) > [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) > [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) > [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) > [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) > [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) > [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) > [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) > [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) > [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) > [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) > [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) > [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) > [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) > [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up > [ 58.243391] usb 2-2: USB disconnect, device number 2 > -- xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in: xhci-hub.c: temp = readl(port_array[wIndex]); if (temp == ~(u32)0) { xhci_hc_died(xhci); retval = -ENODEV; break; } In this case we read the register when hub thread asks to clear port feature. why portsc returns 0xffffffff is a nother quiestion, could the hub thread be running while xhci controller is (in D3)? Was xhci runtime suspended? There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered and the registers return 0xffffffff? -Mathias
On 23/08/2017 13:11, Mathias Nyman wrote: > On 23.08.2017 12:31, Mason wrote: > >> [ 46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd >> [ 46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected >> [ 46.571934] scsi host0: usb-storage 2-2:1.0 >> [ 47.601227] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 >> [ 47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) >> [ 47.621624] sd 0:0:0:0: [sda] Write Protect is off >> [ 47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA >> [ 47.639637] sda: sda1 >> [ 47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk >> [ 58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead >> [ 58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G C 4.13.0-rc6 #11 >> [ 58.115976] Hardware name: Sigma Tango DT >> [ 58.120016] Workqueue: usb_hub_wq hub_event >> [ 58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14) >> [ 58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98) >> [ 58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c) >> [ 58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c) >> [ 58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814) >> [ 58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc) >> [ 58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc) >> [ 58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c) >> [ 58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c) >> [ 58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c) >> [ 58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0) >> [ 58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554) >> [ 58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138) >> [ 58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c) >> [ 58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up >> [ 58.243391] usb 2-2: USB disconnect, device number 2 > > xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in: > xhci-hub.c: > temp = readl(port_array[wIndex]); > if (temp == ~(u32)0) { > xhci_hc_died(xhci); > retval = -ENODEV; > break; > } > > In this case we read the register when hub thread asks to clear port feature. > > why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)? > Was xhci runtime suspended? How do I tell? Should I disable SUSPEND support and all kinds of power management? > There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered > and the registers return 0xffffffff? FWIW, I just compiled v4.12-rc1 and I do get the broken behavior. v4.11.12 = OK v4.12-rc1 = KO PLUG [ 17.226953] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 17.267195] usb-storage 2-2:1.0: USB Mass Storage device detected [ 17.273612] scsi host0: usb-storage 2-2:1.0 [ 18.296369] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 18.307772] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 18.316991] sd 0:0:0:0: [sda] Write Protect is off [ 18.322588] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 18.334828] sda: sda1 [ 18.339507] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 18.366202] random: fast init done UNPLUG [ 21.314111] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 21.322219] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 21.334039] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 21.342453] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 21.349306] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.055471] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 22.063187] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 22.068523] usb 2-2: USB disconnect, device number 2 [ 22.073774] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.085369] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.098823] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.107245] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.114130] pcieport 0000:00:00.0: AER: Device recovery failed [ 22.120026] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 22.128096] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 22.139916] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 22.148320] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 22.155162] pcieport 0000:00:00.0: AER: Device recovery failed The defconfig I used for testing: # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_ARCH_TANGO=y # CONFIG_ARM_ERRATA_643719 is not set CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y CONFIG_PCIE_TANGO_SMP8759=y CONFIG_SMP=y CONFIG_PREEMPT=y CONFIG_HZ_300=y CONFIG_AEABI=y CONFIG_HIGHMEM=y # CONFIG_ATAGS is not set CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y CONFIG_CPUFREQ_DT=y CONFIG_VFP=y CONFIG_NEON=y CONFIG_NET=y CONFIG_PACKET=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_PNP=y CONFIG_IP_PNP_DHCP=y # CONFIG_INET_XFRM_MODE_TRANSPORT is not set # CONFIG_INET_XFRM_MODE_TUNNEL is not set # CONFIG_INET_XFRM_MODE_BEET is not set # CONFIG_IPV6 is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_LOOP=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_NETDEVICES=y CONFIG_NET_VENDOR_AURORA=y CONFIG_AURORA_NB8800=y CONFIG_AT803X_PHY=y # CONFIG_WLAN is not set # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_RT288X=y CONFIG_SERIAL_OF_PLATFORM=y # CONFIG_HW_RANDOM is not set CONFIG_I2C=y CONFIG_I2C_XLR=y CONFIG_GPIOLIB=y CONFIG_THERMAL=y CONFIG_CPU_THERMAL=y CONFIG_TANGO_THERMAL=y CONFIG_WATCHDOG=y CONFIG_TANGOX_WATCHDOG=y CONFIG_FB=y # CONFIG_HID is not set # CONFIG_USB_HID is not set CONFIG_USB=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_EXT4_FS=y CONFIG_FUSE_FS=m CONFIG_VFAT_FS=m CONFIG_TMPFS=y CONFIG_NFS_FS=y # CONFIG_NFS_V2 is not set CONFIG_ROOT_NFS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_UTF8=m CONFIG_PRINTK_TIME=y # CONFIG_CRYPTO_ECHAINIV is not set
On 23/08/2017 13:54, Mason wrote: > On 23/08/2017 13:11, Mathias Nyman wrote: > >> In this case we read the register when hub thread asks to clear port feature. >> >> why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)? >> Was xhci runtime suspended? > > How do I tell? > Should I disable SUSPEND support and all kinds of power management? I compiled a minimal kernel, with lots of irrelevant drivers and frameworks left out, including power management. I still get the "xHCI host controller not responding, assume dead" issue. PLUG [ 59.803499] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 59.836902] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 59.843653] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 59.850900] usb 2-2: Product: DataTraveler 3.0 [ 59.855417] usb 2-2: Manufacturer: Kingston [ 59.859661] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 59.868249] usb-storage 2-2:1.0: USB Mass Storage device detected [ 59.874691] scsi host0: usb-storage 2-2:1.0 [ 60.882801] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 60.891640] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 60.899662] sd 0:0:0:0: [sda] Write Protect is off [ 60.904763] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 60.916154] sda: sda1 [ 60.919798] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG [ 70.545087] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 70.553169] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 70.565084] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 70.573528] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 70.580402] pcieport 0000:00:00.0: AER: Device recovery failed [ 71.275253] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead [ 71.282956] xhci_hcd 0000:01:00.0: HC died; cleaning up [ 71.288304] usb 2-2: USB disconnect, device number 2 [ 71.293445] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.301851] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.313785] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.322240] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.329115] pcieport 0000:00:00.0: AER: Device recovery failed [ 71.335042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.343137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.354984] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.363443] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.370289] pcieport 0000:00:00.0: AER: Device recovery failed defconfig for reference # CONFIG_SWAP is not set CONFIG_SYSVIPC=y CONFIG_NO_HZ_IDLE=y CONFIG_HIGH_RES_TIMERS=y # CONFIG_COMPAT_BRK is not set CONFIG_SLAB=y CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODVERSIONS=y CONFIG_ARCH_TANGO=y # CONFIG_ARM_ERRATA_643719 is not set CONFIG_PCI=y CONFIG_PCIEPORTBUS=y CONFIG_PCI_MSI=y CONFIG_PCIE_TANGO_SMP8759=y CONFIG_SMP=y CONFIG_PREEMPT=y CONFIG_HZ_300=y CONFIG_AEABI=y CONFIG_HIGHMEM=y # CONFIG_ATAGS is not set CONFIG_ARM_APPENDED_DTB=y CONFIG_ARM_ATAG_DTB_COMPAT=y CONFIG_VFP=y CONFIG_NEON=y # CONFIG_SUSPEND is not set CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug" CONFIG_DEVTMPFS=y CONFIG_DEVTMPFS_MOUNT=y CONFIG_BLK_DEV_LOOP=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y # CONFIG_INPUT_KEYBOARD is not set # CONFIG_INPUT_MOUSE is not set # CONFIG_SERIO is not set CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set CONFIG_SERIAL_8250_CONSOLE=y CONFIG_SERIAL_8250_RT288X=y CONFIG_SERIAL_OF_PLATFORM=y # CONFIG_HW_RANDOM is not set # CONFIG_HWMON is not set # CONFIG_HID is not set # CONFIG_USB_HID is not set CONFIG_USB=y CONFIG_USB_ANNOUNCE_NEW_DEVICES=y CONFIG_USB_XHCI_HCD=y CONFIG_USB_STORAGE=y CONFIG_VFAT_FS=m CONFIG_TMPFS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_NLS_UTF8=m CONFIG_PRINTK_TIME=y
On 23/08/2017 14:41, Mason wrote: > I compiled a minimal kernel, with lots of irrelevant drivers and > frameworks left out, including power management. I still get the > "xHCI host controller not responding, assume dead" issue. The problem seems to have a timing-related aspect. I added a bunch of logs (to a slow serial console) and the HC was not killed. I was able to plug the Flash drive a second time. (I am logging config space reads and writes.) [ 1.098314] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 1.103779] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.109315] READ: bus=1 devfn=0 where=61 size=1 val=0x1 [ 1.114746] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.120311] READ: bus=1 devfn=0 where=4 size=2 val=0x142 [ 1.125841] WRITE: bus=1 devfn=0 where=4 size=2 val=0x146 NB: I added msleep(2500) in usb_add_hcd() [ 3.681867] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 3.687154] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 3.694656] READ: bus=1 devfn=0 where=96 size=1 val=0x30 [ 3.705736] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010 [ 3.714233] READ: bus=1 devfn=0 where=12 size=1 val=0x10 [ 3.719752] READ: bus=1 devfn=0 where=4 size=2 val=0x146 [ 3.725269] WRITE: bus=1 devfn=0 where=4 size=2 val=0x156 [ 3.730794] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.736314] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.741835] WRITE: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.747354] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.752871] READ: bus=1 devfn=0 where=148 size=4 val=0x1000 [ 3.758775] READ: bus=1 devfn=0 where=146 size=2 val=0x7 [ 3.764297] WRITE: bus=1 devfn=0 where=146 size=2 val=0xc007 [ 3.770108] READ: bus=1 devfn=0 where=4 size=2 val=0x146 [ 3.775626] WRITE: bus=1 devfn=0 where=4 size=2 val=0x546 [ 3.781146] READ: bus=1 devfn=0 where=146 size=2 val=0xc007 [ 3.786925] WRITE: bus=1 devfn=0 where=146 size=2 val=0x8007 [ 3.792919] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002 [ 3.799756] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 3.807021] usb usb1: Product: xHCI Host Controller [ 3.811933] usb usb1: Manufacturer: Linux 4.12.0-rc1 xhci-hcd [ 3.817713] usb usb1: SerialNumber: 0000:01:00.0 [ 3.822773] hub 1-0:1.0: USB hub found [ 3.826598] hub 1-0:1.0: 4 ports detected NB: I added msleep(2500) in usb_add_hcd() [ 6.455246] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 6.460520] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 6.468028] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM. [ 6.476236] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003 [ 6.483068] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 6.490334] usb usb2: Product: xHCI Host Controller [ 6.495240] usb usb2: Manufacturer: Linux 4.12.0-rc1 xhci-hcd [ 6.501020] usb usb2: SerialNumber: 0000:01:00.0 [ 6.505994] hub 2-0:1.0: USB hub found [ 6.509806] hub 2-0:1.0: 4 ports detected [ 6.514215] usbcore: registered new interface driver usb-storage [ 6.520313] Registering SWP/SWPB emulation handler [ 6.525541] READ: bus=0 devfn=0 where=132 size=4 val=0x8001 [ 6.531334] READ: bus=0 devfn=0 where=6 size=2 val=0x4010 [ 6.536955] READ: bus=0 devfn=0 where=52 size=1 val=0x50 [ 6.542484] READ: bus=0 devfn=0 where=80 size=2 val=0x7805 [ 6.548180] READ: bus=0 devfn=0 where=120 size=2 val=0x8001 [ 6.553969] READ: bus=0 devfn=0 where=128 size=2 val=0x10 [ 6.559584] READ: bus=0 devfn=0 where=124 size=2 val=0x6008 [ 6.565387] READ: bus=1 devfn=0 where=164 size=4 val=0x8fc0 [ 6.571167] READ: bus=1 devfn=0 where=6 size=2 val=0x10 [ 6.576609] READ: bus=1 devfn=0 where=52 size=1 val=0x50 [ 6.582129] READ: bus=1 devfn=0 where=80 size=2 val=0x7001 [ 6.587821] READ: bus=1 devfn=0 where=112 size=2 val=0x9005 [ 6.593601] READ: bus=1 devfn=0 where=144 size=2 val=0xa011 [ 6.599381] READ: bus=1 devfn=0 where=160 size=2 val=0x10 [ 6.604985] READ: bus=1 devfn=0 where=84 size=2 val=0x8 [ 6.623665] Freeing unused kernel memory: 9216K PLUG #1 [ 66.783559] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd [ 66.816910] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 66.823661] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 66.830909] usb 2-2: Product: DataTraveler 3.0 [ 66.835417] usb 2-2: Manufacturer: Kingston [ 66.839660] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 66.848131] usb-storage 2-2:1.0: USB Mass Storage device detected [ 66.854584] scsi host0: usb-storage 2-2:1.0 [ 67.869446] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 67.878270] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 67.886248] sd 0:0:0:0: [sda] Write Protect is off [ 67.891347] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 67.902708] sda: sda1 [ 67.906372] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #1 [ 71.697358] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 71.703572] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 71.709170] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 71.715569] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 71.723632] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 71.729470] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.735373] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.741013] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.746914] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.752552] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 71.758194] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 71.770008] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 71.778494] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 71.785358] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 71.791259] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 71.796897] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 71.802524] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.451908] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.458120] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.463717] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.470012] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.476221] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.481819] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.488109] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.494319] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.499916] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.506205] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.512415] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 72.518011] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 72.524263] xhci_hcd 0000:01:00.0: Cannot set link state. [ 72.529711] usb usb2-port2: cannot disable (err = -32) [ 72.534883] usb 2-2: USB disconnect, device number 2 [ 72.540042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.548365] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.554264] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.560157] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.565778] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.571654] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.577273] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.582891] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.594705] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.603122] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.609955] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.615833] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.621441] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.627061] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.632931] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.640984] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.646769] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.652636] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.658245] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.664114] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.669722] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.675330] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.687142] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.695545] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.702376] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.708244] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.713856] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.719473] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.725342] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.733394] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.739178] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.745044] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.750653] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.756520] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.762128] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.767734] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.779548] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.787950] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.794781] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.800649] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.806258] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.811873] pcieport 0000:00:00.0: AER: Device recovery failed [ 72.817741] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 72.825793] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 72.831574] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.837442] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.843054] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.848922] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.854529] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.860137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 72.871951] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 72.880353] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 72.887184] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 72.893051] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 72.898660] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 72.904273] pcieport 0000:00:00.0: AER: Device recovery failed PLUG #2 [ 165.860193] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd [ 165.893583] usb 2-2: New USB device found, idVendor=0951, idProduct=1666 [ 165.900333] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 165.907515] usb 2-2: Product: DataTraveler 3.0 [ 165.911989] usb 2-2: Manufacturer: Kingston [ 165.916198] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA [ 165.924547] usb-storage 2-2:1.0: USB Mass Storage device detected [ 165.930970] scsi host0: usb-storage 2-2:1.0 [ 166.962705] scsi 0:0:0:0: Direct-Access Kingston DataTraveler 3.0 PQ: 0 ANSI: 6 [ 166.971494] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB) [ 166.979556] sd 0:0:0:0: [sda] Write Protect is off [ 166.984591] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA [ 166.995847] random: fast init done [ 166.999430] sda: sda1 [ 167.003039] sd 0:0:0:0: [sda] Attached SCSI removable disk UNPLUG #2 [ 171.918834] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 171.925046] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 171.930645] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 171.936941] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 171.945000] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 171.950784] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 171.956656] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 171.962263] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 171.968134] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 171.973741] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 171.979354] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 171.991164] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 171.999597] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.006429] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.012300] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.017908] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.023529] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.675221] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.681432] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.687030] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.693325] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.699536] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.705133] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.711424] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.717633] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.723230] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.729517] READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.735726] READ: bus=0 devfn=0 where=2100 size=4 val=0x0 [ 172.741322] WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024 [ 172.747574] xhci_hcd 0000:01:00.0: Cannot set link state. [ 172.753021] usb usb2-port2: cannot disable (err = -32) [ 172.758193] usb 2-2: USB disconnect, device number 3 [ 172.763340] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.771627] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.777515] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.783408] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.789030] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.794907] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.800526] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.806146] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 172.817960] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 172.826375] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.833208] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.839078] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.844685] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.850305] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.856183] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.864236] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.870020] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.875889] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.881497] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.887365] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.892974] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.898582] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 172.910393] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 172.918796] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 172.925627] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.931494] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.937107] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.942724] pcieport 0000:00:00.0: AER: Device recovery failed [ 172.948593] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 172.956644] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 172.962428] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.968295] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.973903] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 172.979771] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 172.985379] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 172.990985] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 173.002799] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 173.011202] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 173.018033] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.023901] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.029510] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.035123] pcieport 0000:00:00.0: AER: Device recovery failed [ 173.040990] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000 [ 173.049042] READ: bus=0 devfn=0 where=136 size=2 val=0x281f [ 173.054825] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.060693] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.066305] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.072173] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.077780] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.083388] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID) [ 173.095202] pcieport 0000:00:00.0: device [1105:0024] error status/mask=00004000/00000000 [ 173.103605] pcieport 0000:00:00.0: [14] Completion Timeout (First) [ 173.110435] READ: bus=0 devfn=0 where=2052 size=4 val=0x4000 [ 173.116303] READ: bus=0 devfn=0 where=2056 size=4 val=0x0 [ 173.121911] READ: bus=0 devfn=0 where=2072 size=4 val=0xe [ 173.127524] pcieport 0000:00:00.0: AER: Device recovery failed NOTE BENE: these issues do not occur at all with a USB2 Flash drive. [ 2093.564771] usb 1-2: new high-speed USB device number 2 using xhci_hcd [ 2093.790646] usb 1-2: New USB device found, idVendor=058f, idProduct=6387 [ 2093.797397] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ 2093.804583] usb 1-2: Product: Mass Storage [ 2093.808707] usb 1-2: Manufacturer: Generic [ 2093.812829] usb 1-2: SerialNumber: 31A69E70 [ 2093.819244] usb-storage 1-2:1.0: USB Mass Storage device detected [ 2093.825624] scsi host0: usb-storage 1-2:1.0 [ 2094.856918] scsi 0:0:0:0: Direct-Access Generic Flash Disk 8.07 PQ: 0 ANSI: 2 [ 2094.866196] sd 0:0:0:0: [sda] 4106240 512-byte logical blocks: (2.10 GB/1.96 GiB) [ 2094.874232] sd 0:0:0:0: [sda] Write Protect is off [ 2094.879350] sd 0:0:0:0: [sda] No Caching mode page found [ 2094.884816] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 2094.909111] sda: sda1 [ 2094.912935] sd 0:0:0:0: [sda] Attached SCSI removable disk [ 2100.516396] usb 1-2: USB disconnect, device number 2 Regards.
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 51cd4b8..ade2ad6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci) if (xhci->xhc_state & XHCI_STATE_DYING) return; - xhci_err(xhci, "xHCI host controller not responding, assume dead\n"); + xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n", + __builtin_return_address(0)); xhci->xhc_state |= XHCI_STATE_DYING; xhci_cleanup_command_queue(xhci);