diff mbox

Possible regression between 4.9 and 4.13

Message ID 599D3410.9050504@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mathias Nyman Aug. 23, 2017, 7:51 a.m. UTC
On 23.08.2017 09:07, Felipe Balbi wrote:
>
> Hi,
>
> Mason <slash.tmp@free.fr> writes:
>> Hello,
>>
>> The driver for my system's PCIe host bridge landed recently
>> (in 4.13) but it was developed on 4.9
>>
>> I tested the PCIe host bridge by plugging a 4-port USB3 adapter
>> into the PCIe slot (system at rest) and plugging an USB3 Flash
>> drive into the USB3 adapter (at run-time).
>>
>> On 4.9, the setup works (almost perfectly, see below).
>> On 4.13, once I unplug the Flash drive, the controller port
>> remains unresponsive.
>>
>>
>> On 4.9, I said *almost* perfectly, because the pcieport driver
>> does report a few non-fatal errors when I unplug:
>>
>> [  193.838504] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
>> [  193.878081] usb-storage 2-2:1.0: USB Mass Storage device detected
>> [  193.884547] scsi host0: usb-storage 2-2:1.0
>> [  194.907936] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
>> [  194.920296] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
>> [  194.928666] sd 0:0:0:0: [sda] Write Protect is off
>> [  194.933755] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
>> [  194.946074]  sda: sda1
>> [  194.953608] sd 0:0:0:0: [sda] Attached SCSI removable disk
>>
>> [  208.930260] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
>> [  208.938342] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
>> [  208.950163] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
>> [  208.958577] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
>> [  208.965432] pcieport 0000:00:00.0: AER: Device recovery failed
>> [  209.663733] xhci_hcd 0000:01:00.0: Cannot set link state.
>> [  209.669194] usb usb2-port2: cannot disable (err = -32)
>> [  209.674376] usb 2-2: USB disconnect, device number 2
>> [  209.680481] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
>> [  209.688689] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
>> [  209.700555] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
>> [  209.708978] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
>> [  209.715845] pcieport 0000:00:00.0: AER: Device recovery failed
>> [  209.721722] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
>> [  209.729785] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
>> [  209.741602] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
>> [  209.750027] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
>> [  209.756866] pcieport 0000:00:00.0: AER: Device recovery failed
>>
>> After that, I can still plug the drive into the same port.
>>
>> But on 4.13, I get
>>
>> [   27.330378] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
>> [   27.369383] usb-storage 2-2:1.0: USB Mass Storage device detected
>> [   27.375840] scsi host0: usb-storage 2-2:1.0
>> [   28.403035] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
>> [   28.413326] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
>> [   28.423653] sd 0:0:0:0: [sda] Write Protect is off
>> [   28.429139] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
>> [   28.441529]  sda: sda1
>> [   28.449431] sd 0:0:0:0: [sda] Attached SCSI removable disk
>>
>> [   90.592134] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
>> [   90.599857] xhci_hcd 0000:01:00.0: HC died; cleaning up
>> [   90.605336] usb 2-2: USB disconnect, device number 2
>> [   90.630414] udevd[955]: inotify_add_watch(6, /dev/sda, 10) failed: No such file or directory
>>
>> Trying to replug into the same port = nothing happens
>> (Linux did say "assume dead")
>>
>> Any idea what could have changed between 4.9 and 4.13 ?
>>
>
> Quite a bit:
>
> $ git rev-list --no-merges  --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/
> 58
>

very likely cause is the more aggressive detection of pci removed xhci hosts

See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
     xhci: Rework how we handle unresponsive or hoptlug removed hosts

It checks if a xhci register reads returns 0xffffffff and assumes xhci
died in that case.

Could you add something like the below to check which what is killing the host?
Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it.



Thanks
Mathias

Comments

Mason Aug. 23, 2017, 9:18 a.m. UTC | #1
On 23/08/2017 09:51, Mathias Nyman wrote:

> On 23.08.2017 09:07, Felipe Balbi wrote:
>
>> Mason writes:
>>
>>> Any idea what could have changed between 4.9 and 4.13 ?
>>
>> Quite a bit:
>>
>> $ git rev-list --no-merges  --count v4.13-rc6 ^v4.9 -- drivers/usb/host/xhci drivers/usb/core/
>> 58
> 
> very likely cause is the more aggressive detection of pci removed xhci hosts
> 
> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
>      xhci: Rework how we handle unresponsive or hoptlug removed hosts
> 
> It checks if a xhci register reads returns 0xffffffff and assumes xhci
> died in that case.
> 
> Could you add something like the below to check which what is killing the host?
> Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it.
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 51cd4b8..ade2ad6 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -922,7 +922,8 @@ void xhci_hc_died(struct xhci_hcd *xhci)
>          if (xhci->xhc_state & XHCI_STATE_DYING)
>                  return;
>   
> -       xhci_err(xhci, "xHCI host controller not responding, assume dead\n");
> +       xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n",
> +                __builtin_return_address(0));
>          xhci->xhc_state |= XHCI_STATE_DYING;
>   
>          xhci_cleanup_command_queue(xhci);

I'll try some coarse bisection to narrow it down.

$ git describe --contains d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
v4.12-rc1~97^2~39

I'll check 4.11 first.

I wanted to mention that the XHCI setup on 4.9 and 4.13 print
slightly different things (at the beginning).

On 4.9
[    1.240322] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    1.245617] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    1.258691] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010
[    1.268090] hub 1-0:1.0: USB hub found
[    1.271905] hub 1-0:1.0: 4 ports detected
[    1.276372] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    1.281645] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    1.289173] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.297775] hub 2-0:1.0: USB hub found
[    1.301577] hub 2-0:1.0: 4 ports detected
[    1.306194] usbcore: registered new interface driver usb-storage

On 4.13
[    1.222471] pcieport 0000:00:00.0: of_irq_parse_pci: failed with rc=-22
[    1.229156] xhci_hcd 0000:01:00.0: Resetting
[    2.268836] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    2.274126] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    2.287222] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010
[    2.296653] hub 1-0:1.0: USB hub found
[    2.300478] hub 1-0:1.0: 4 ports detected
[    2.304962] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    2.310246] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    2.317776] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    2.326419] hub 2-0:1.0: USB hub found
[    2.330229] hub 2-0:1.0: 4 ports detected
[    2.334869] usbcore: registered new interface driver usb-storage

FWIW, "of_irq_parse_pci: failed with rc=-22"
seems to come from:

[    1.257411] [<c03d80c8>] (of_irq_parse_pci) from [<c03d8270>] (of_irq_parse_and_map_pci+0x10/0x2c)
[    1.266420] [<c03d8270>] (of_irq_parse_and_map_pci) from [<c03100a8>] (pci_assign_irq+0x78/0xb0)
[    1.275254] [<c03100a8>] (pci_assign_irq) from [<c030a1c8>] (pci_device_probe+0x18/0x128)
[    1.283476] [<c030a1c8>] (pci_device_probe) from [<c0357864>] (driver_probe_device+0x244/0x2c8)

The error logging was added by f1aa54840657f
No, that just turned one specific error into a warning.
Need to dig a bit more.

Regards.
Mason Aug. 23, 2017, 9:31 a.m. UTC | #2
On 23/08/2017 09:51, Mathias Nyman wrote:

> very likely cause is the more aggressive detection of pci removed xhci hosts
> 
> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
>      xhci: Rework how we handle unresponsive or hoptlug removed hosts
> 
> It checks if a xhci register reads returns 0xffffffff and assumes xhci
> died in that case.
> 
> Could you add something like the below to check which what is killing the host?
> Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it.

[   46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[   46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected
[   46.571934] scsi host0: usb-storage 2-2:1.0
[   47.601227] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[   47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[   47.621624] sd 0:0:0:0: [sda] Write Protect is off
[   47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   47.639637]  sda: sda1
[   47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk
[   58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[   58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G         C      4.13.0-rc6 #11
[   58.115976] Hardware name: Sigma Tango DT
[   58.120016] Workqueue: usb_hub_wq hub_event
[   58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14)
[   58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98)
[   58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c)
[   58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c)
[   58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814)
[   58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc)
[   58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc)
[   58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c)
[   58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c)
[   58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c)
[   58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0)
[   58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554)
[   58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138)
[   58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c)
[   58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up
[   58.243391] usb 2-2: USB disconnect, device number 2
Mason Aug. 23, 2017, 10:19 a.m. UTC | #3
On 23/08/2017 09:51, Mathias Nyman wrote:

> very likely cause is the more aggressive detection of pci removed xhci hosts
> 
> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
>      xhci: Rework how we handle unresponsive or hoptlug removed hosts
> 
> It checks if a xhci register reads returns 0xffffffff and assumes xhci
> died in that case.

I've just tested 4.11.12 + a few local patches to back-port
PCIe host bridge support.

It "works" as well as 4.9
(i.e. modulo the "AER: Uncorrected (Non-Fatal) error received")

[    0.508533] pcie_tango 50000000.pcie: simultaneous PCI config and MMIO accesses may cause data corruption
[    0.519622] OF: PCI: host bridge /soc/pcie@2e000 ranges:
[    0.519645] OF: PCI:   MEM 0x50400000..0x53ffffff -> 0x00400000
[    0.519725] pcie_tango 50000000.pcie: ECAM at [mem 0x50000000-0x503fffff] for [bus 00-03]
[    0.519872] pcie_tango 50000000.pcie: PCI host bridge to bus 0000:00
[    0.519886] pci_bus 0000:00: root bus resource [bus 00-03]
[    0.519898] pci_bus 0000:00: root bus resource [mem 0x50400000-0x53ffffff] (bus address [0x00400000-0x03ffffff])
[    0.520201] PCI: bus0: Fast back to back transfers disabled
[    0.520213] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[    0.520922] PCI: bus1: Fast back to back transfers disabled
[    0.520964] pci 0000:00:00.0: of_irq_parse_pci: failed with rc=-22
[    0.520993] pci 0000:00:00.0: BAR 8: assigned [mem 0x50400000-0x504fffff]
[    0.521004] pci 0000:01:00.0: BAR 0: assigned [mem 0x50400000-0x50401fff 64bit]
[    0.521025] pci 0000:00:00.0: PCI bridge to [bus 01]
[    0.521033] pci 0000:00:00.0:   bridge window [mem 0x50400000-0x504fffff]
[    0.521085] pcieport 0000:00:00.0: enabling device (0140 -> 0142)
[    0.521282] pcieport 0000:00:00.0: Signaling PME with IRQ 30
[    0.521402] pcieport 0000:00:00.0: AER enabled with IRQ 30
[    0.521526] pci 0000:01:00.0: enabling device (0140 -> 0142)
...
[    1.239706] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    1.244998] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    1.258048] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010
[    1.267467] hub 1-0:1.0: USB hub found
[    1.271287] hub 1-0:1.0: 4 ports detected
[    1.275761] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    1.281048] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    1.288578] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    1.297234] hub 2-0:1.0: USB hub found
[    1.301042] hub 2-0:1.0: 4 ports detected
[    1.305681] usbcore: registered new interface driver usb-storage


PLUG #1
[   26.104607] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[   26.143799] usb-storage 2-2:1.0: USB Mass Storage device detected
[   26.150253] scsi host0: usb-storage 2-2:1.0
[   27.177298] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[   27.187586] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[   27.199000] sd 0:0:0:0: [sda] Write Protect is off
[   27.204186] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   27.216322]  sda: sda1
[   27.220584] sd 0:0:0:0: [sda] Attached SCSI removable disk
[   27.252046] random: fast init done

UNPLUG #1
[   37.334040] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   37.342135] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   37.353970] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   37.362589] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   37.369485] pcieport 0000:00:00.0: AER: Device recovery failed
[   38.066538] xhci_hcd 0000:01:00.0: Cannot set link state.
[   38.072039] usb usb2-port2: cannot disable (err = -32)
[   38.077348] usb 2-2: USB disconnect, device number 2
[   38.082711] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   38.094279] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   38.108006] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   38.116878] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   38.123954] pcieport 0000:00:00.0: AER: Device recovery failed

PLUG #2
[   55.097922] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd
[   55.137590] usb-storage 2-2:1.0: USB Mass Storage device detected
[   55.144016] scsi host0: usb-storage 2-2:1.0
[   56.163907] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[   56.174851] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[   56.184218] sd 0:0:0:0: [sda] Write Protect is off
[   56.190162] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   56.202117]  sda: sda1
[   56.207112] sd 0:0:0:0: [sda] Attached SCSI removable disk

UNPLUG #2
[   63.228310] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   63.236403] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   63.248220] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   63.256653] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   63.263523] pcieport 0000:00:00.0: AER: Device recovery failed
[   63.959768] xhci_hcd 0000:01:00.0: Cannot set link state.
[   63.965227] usb usb2-port2: cannot disable (err = -32)
[   63.970409] usb 2-2: USB disconnect, device number 3
[   63.975664] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   63.987356] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   64.000021] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   64.008655] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   64.015553] pcieport 0000:00:00.0: AER: Device recovery failed
[   64.021449] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   64.029580] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   64.041410] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   64.049818] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   64.056658] pcieport 0000:00:00.0: AER: Device recovery failed


Bjorn,

What do you make of the AER logs?
What can I do to debug this issue?

Regards.



FWIW, verbose lspci output below.

# lspci -vv
00:00.0 PCI bridge: Sigma Designs, Inc. Device 0024 (rev 01) (prog-if 00 [Normal decode])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin ? routed to IRQ 30
        Region 0: Memory at <ignored> (64-bit, non-prefetchable)
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        I/O behind bridge: 00000000-00000fff
        Memory behind bridge: 00400000-004fffff
        Prefetchable memory behind bridge: 00000000-000fffff
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity+ SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [50] MSI: Enable+ Count=1/4 Maskable- 64bit+
                Address: 00000000a002e07c  Data: 0000
        Capabilities: [78] Power Management version 3
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=3 PME-
        Capabilities: [80] Express (v2) Root Port (Slot-), MSI 03
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr+ UncorrErr+ FatalErr- UnsuppReq- AuxPwr- TransPend+
                LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <2us, L1 <4us
                        ClockPM- Surprise- LLActRep- BwNot+ ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
                RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
                RootCap: CRSVisible-
                RootSta: PME ReqID 0000, PMEStatus- PMEPending-
                DevCap2: Completion Timeout: Range B, TimeoutDis-, LTR-, OBFF Not Supported ARIFwd-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled ARIFwd-
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [800 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO+ CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 0e, GenCap- CGenEn- ChkCap- ChkEn-
        Kernel driver in use: pcieport

01:00.0 USB controller: Renesas Technology Corp. uPD720201 USB 3.0 Host Controller (rev 03) (prog-if 30 [XHCI])
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 0
        Region 0: Memory at 50400000 (64-bit, non-prefetchable) [size=8K]
        Capabilities: [50] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [70] MSI: Enable- Count=1/8 Maskable- 64bit+
                Address: 0000000000000000  Data: 0000
        Capabilities: [90] MSI-X: Enable+ Count=8 Masked-
                Vector table: BAR=0 offset=00001000
                PBA: BAR=0 offset=00001080
        Capabilities: [a0] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 unlimited
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Kernel driver in use: xhci_hcd
Mathias Nyman Aug. 23, 2017, 11:11 a.m. UTC | #4
On 23.08.2017 12:31, Mason wrote:
> On 23/08/2017 09:51, Mathias Nyman wrote:
>
>> very likely cause is the more aggressive detection of pci removed xhci hosts
>>
>> See commit d9f11ba9f107aa335091ab8d7ba5eea714e46e8b
>>       xhci: Rework how we handle unresponsive or hoptlug removed hosts
>>
>> It checks if a xhci register reads returns 0xffffffff and assumes xhci
>> died in that case.
>>
>> Could you add something like the below to check which what is killing the host?
>> Or a BUG()/WARN() in xhci_hc_died() to get a backtrace of who called it.
>
> [   46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
> [   46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected
> [   46.571934] scsi host0: usb-storage 2-2:1.0
> [   47.601227] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
> [   47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
> [   47.621624] sd 0:0:0:0: [sda] Write Protect is off
> [   47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
> [   47.639637]  sda: sda1
> [   47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk
> [   58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
> [   58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G         C      4.13.0-rc6 #11
> [   58.115976] Hardware name: Sigma Tango DT
> [   58.120016] Workqueue: usb_hub_wq hub_event
> [   58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14)
> [   58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98)
> [   58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c)
> [   58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c)
> [   58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814)
> [   58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc)
> [   58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc)
> [   58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c)
> [   58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c)
> [   58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c)
> [   58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0)
> [   58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554)
> [   58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138)
> [   58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c)
> [   58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up
> [   58.243391] usb 2-2: USB disconnect, device number 2
> --

xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in:
xhci-hub.c:
         temp = readl(port_array[wIndex]);
                 if (temp == ~(u32)0) {
                         xhci_hc_died(xhci);
			retval = -ENODEV;
	                break;
		}

In this case we read the register when hub thread asks to clear port feature.

why portsc returns 0xffffffff is a nother quiestion, could the hub thread be running while xhci controller is (in D3)?
Was xhci runtime suspended?
There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered
and the registers return 0xffffffff?

-Mathias
Mason Aug. 23, 2017, 11:54 a.m. UTC | #5
On 23/08/2017 13:11, Mathias Nyman wrote:

> On 23.08.2017 12:31, Mason wrote:
> 
>> [   46.525247] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
>> [   46.565496] usb-storage 2-2:1.0: USB Mass Storage device detected
>> [   46.571934] scsi host0: usb-storage 2-2:1.0
>> [   47.601227] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
>> [   47.611340] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
>> [   47.621624] sd 0:0:0:0: [sda] Write Protect is off
>> [   47.627131] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
>> [   47.639637]  sda: sda1
>> [   47.648091] sd 0:0:0:0: [sda] Attached SCSI removable disk
>> [   58.100306] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
>> [   58.108021] CPU: 0 PID: 939 Comm: kworker/0:2 Tainted: G         C      4.13.0-rc6 #11
>> [   58.115976] Hardware name: Sigma Tango DT
>> [   58.120016] Workqueue: usb_hub_wq hub_event
>> [   58.124241] [<c010f288>] (unwind_backtrace) from [<c010af58>] (show_stack+0x10/0x14)
>> [   58.132033] [<c010af58>] (show_stack) from [<c049d714>] (dump_stack+0x84/0x98)
>> [   58.139302] [<c049d714>] (dump_stack) from [<c03b090c>] (xhci_hc_died.part.9+0x50/0x23c)
>> [   58.147438] [<c03b090c>] (xhci_hc_died.part.9) from [<c03b5d80>] (xhci_hub_control+0xf3c/0x175c)
>> [   58.156273] [<c03b5d80>] (xhci_hub_control) from [<c03934a4>] (usb_hcd_submit_urb+0x264/0x814)
>> [   58.164932] [<c03934a4>] (usb_hcd_submit_urb) from [<c0394fa4>] (usb_start_wait_urb+0x4c/0xbc)
>> [   58.173591] [<c0394fa4>] (usb_start_wait_urb) from [<c03950b4>] (usb_control_msg+0xa0/0xcc)
>> [   58.181985] [<c03950b4>] (usb_control_msg) from [<c038bf54>] (usb_clear_port_feature+0x44/0x4c)
>> [   58.190730] [<c038bf54>] (usb_clear_port_feature) from [<c038c320>] (hub_port_reset+0x228/0x51c)
>> [   58.199561] [<c038c320>] (hub_port_reset) from [<c038fd68>] (hub_event+0x87c/0x108c)
>> [   58.207349] [<c038fd68>] (hub_event) from [<c012ecc4>] (process_one_work+0x1d8/0x3f0)
>> [   58.215220] [<c012ecc4>] (process_one_work) from [<c012f8d8>] (worker_thread+0x38/0x554)
>> [   58.223354] [<c012f8d8>] (worker_thread) from [<c01347d0>] (kthread+0x108/0x138)
>> [   58.230789] [<c01347d0>] (kthread) from [<c01076d8>] (ret_from_fork+0x14/0x3c)
>> [   58.238056] xhci_hcd 0000:01:00.0: HC died; cleaning up
>> [   58.243391] usb 2-2: USB disconnect, device number 2
> 
> xhci driver reads 0xffffffff from a mmio mapped xhci portsc register and bails out in:
> xhci-hub.c:
>          temp = readl(port_array[wIndex]);
>                  if (temp == ~(u32)0) {
>                          xhci_hc_died(xhci);
> 			retval = -ENODEV;
> 	                break;
> 		}
> 
> In this case we read the register when hub thread asks to clear port feature.
> 
> why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)?
> Was xhci runtime suspended?

How do I tell?
Should I disable SUSPEND support and all kinds of power management?

> There were some pcieport errors in another log you showed, maybe PCI devices are not properly recovered
> and the registers return 0xffffffff?

FWIW, I just compiled v4.12-rc1 and I do get the broken behavior.

v4.11.12 = OK
v4.12-rc1 = KO

PLUG
[   17.226953] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[   17.267195] usb-storage 2-2:1.0: USB Mass Storage device detected
[   17.273612] scsi host0: usb-storage 2-2:1.0
[   18.296369] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[   18.307772] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[   18.316991] sd 0:0:0:0: [sda] Write Protect is off
[   18.322588] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   18.334828]  sda: sda1
[   18.339507] sd 0:0:0:0: [sda] Attached SCSI removable disk
[   18.366202] random: fast init done

UNPLUG
[   21.314111] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   21.322219] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   21.334039] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   21.342453] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   21.349306] pcieport 0000:00:00.0: AER: Device recovery failed
[   22.055471] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[   22.063187] xhci_hcd 0000:01:00.0: HC died; cleaning up
[   22.068523] usb 2-2: USB disconnect, device number 2
[   22.073774] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   22.085369] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   22.098823] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   22.107245] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   22.114130] pcieport 0000:00:00.0: AER: Device recovery failed
[   22.120026] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   22.128096] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   22.139916] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   22.148320] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   22.155162] pcieport 0000:00:00.0: AER: Device recovery failed


The defconfig I used for testing:

# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_ARCH_TANGO=y
# CONFIG_ARM_ERRATA_643719 is not set
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_MSI=y
CONFIG_PCIE_TANGO_SMP8759=y
CONFIG_SMP=y
CONFIG_PREEMPT=y
CONFIG_HZ_300=y
CONFIG_AEABI=y
CONFIG_HIGHMEM=y
# CONFIG_ATAGS is not set
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPUFREQ_DT=y
CONFIG_VFP=y
CONFIG_NEON=y
CONFIG_NET=y
CONFIG_PACKET=y
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_IPV6 is not set
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
CONFIG_NETDEVICES=y
CONFIG_NET_VENDOR_AURORA=y
CONFIG_AURORA_NB8800=y
CONFIG_AT803X_PHY=y
# CONFIG_WLAN is not set
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_SERIO is not set
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_RT288X=y
CONFIG_SERIAL_OF_PLATFORM=y
# CONFIG_HW_RANDOM is not set
CONFIG_I2C=y
CONFIG_I2C_XLR=y
CONFIG_GPIOLIB=y
CONFIG_THERMAL=y
CONFIG_CPU_THERMAL=y
CONFIG_TANGO_THERMAL=y
CONFIG_WATCHDOG=y
CONFIG_TANGOX_WATCHDOG=y
CONFIG_FB=y
# CONFIG_HID is not set
# CONFIG_USB_HID is not set
CONFIG_USB=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_EXT4_FS=y
CONFIG_FUSE_FS=m
CONFIG_VFAT_FS=m
CONFIG_TMPFS=y
CONFIG_NFS_FS=y
# CONFIG_NFS_V2 is not set
CONFIG_ROOT_NFS=y
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_PRINTK_TIME=y
# CONFIG_CRYPTO_ECHAINIV is not set
Mason Aug. 23, 2017, 12:41 p.m. UTC | #6
On 23/08/2017 13:54, Mason wrote:

> On 23/08/2017 13:11, Mathias Nyman wrote:
> 
>> In this case we read the register when hub thread asks to clear port feature.
>>
>> why portsc returns 0xffffffff is a another question, could the hub thread be running while xhci controller is (in D3)?
>> Was xhci runtime suspended?
> 
> How do I tell?
> Should I disable SUSPEND support and all kinds of power management?

I compiled a minimal kernel, with lots of irrelevant drivers and
frameworks left out, including power management. I still get the
"xHCI host controller not responding, assume dead" issue.

PLUG
[   59.803499] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[   59.836902] usb 2-2: New USB device found, idVendor=0951, idProduct=1666
[   59.843653] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   59.850900] usb 2-2: Product: DataTraveler 3.0
[   59.855417] usb 2-2: Manufacturer: Kingston
[   59.859661] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA
[   59.868249] usb-storage 2-2:1.0: USB Mass Storage device detected
[   59.874691] scsi host0: usb-storage 2-2:1.0
[   60.882801] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[   60.891640] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[   60.899662] sd 0:0:0:0: [sda] Write Protect is off
[   60.904763] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   60.916154]  sda: sda1
[   60.919798] sd 0:0:0:0: [sda] Attached SCSI removable disk

UNPLUG
[   70.545087] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   70.553169] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   70.565084] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   70.573528] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   70.580402] pcieport 0000:00:00.0: AER: Device recovery failed

[   71.275253] xhci_hcd 0000:01:00.0: xHCI host controller not responding, assume dead
[   71.282956] xhci_hcd 0000:01:00.0: HC died; cleaning up
[   71.288304] usb 2-2: USB disconnect, device number 2

[   71.293445] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   71.301851] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   71.313785] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   71.322240] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   71.329115] pcieport 0000:00:00.0: AER: Device recovery failed

[   71.335042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   71.343137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   71.354984] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   71.363443] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   71.370289] pcieport 0000:00:00.0: AER: Device recovery failed


defconfig for reference

# CONFIG_SWAP is not set
CONFIG_SYSVIPC=y
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
# CONFIG_COMPAT_BRK is not set
CONFIG_SLAB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODVERSIONS=y
CONFIG_ARCH_TANGO=y
# CONFIG_ARM_ERRATA_643719 is not set
CONFIG_PCI=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCI_MSI=y
CONFIG_PCIE_TANGO_SMP8759=y
CONFIG_SMP=y
CONFIG_PREEMPT=y
CONFIG_HZ_300=y
CONFIG_AEABI=y
CONFIG_HIGHMEM=y
# CONFIG_ATAGS is not set
CONFIG_ARM_APPENDED_DTB=y
CONFIG_ARM_ATAG_DTB_COMPAT=y
CONFIG_VFP=y
CONFIG_NEON=y
# CONFIG_SUSPEND is not set
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_BLK_DEV_LOOP=y
CONFIG_SCSI=y
CONFIG_BLK_DEV_SD=y
# CONFIG_INPUT_KEYBOARD is not set
# CONFIG_INPUT_MOUSE is not set
# CONFIG_SERIO is not set
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_SERIAL_8250_RT288X=y
CONFIG_SERIAL_OF_PLATFORM=y
# CONFIG_HW_RANDOM is not set
# CONFIG_HWMON is not set
# CONFIG_HID is not set
# CONFIG_USB_HID is not set
CONFIG_USB=y
CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
CONFIG_USB_XHCI_HCD=y
CONFIG_USB_STORAGE=y
CONFIG_VFAT_FS=m
CONFIG_TMPFS=y
CONFIG_NLS_CODEPAGE_437=m
CONFIG_NLS_ISO8859_1=m
CONFIG_NLS_UTF8=m
CONFIG_PRINTK_TIME=y
Mason Aug. 23, 2017, 2:30 p.m. UTC | #7
On 23/08/2017 14:41, Mason wrote:

> I compiled a minimal kernel, with lots of irrelevant drivers and
> frameworks left out, including power management. I still get the
> "xHCI host controller not responding, assume dead" issue.

The problem seems to have a timing-related aspect.

I added a bunch of logs (to a slow serial console) and the HC was
not killed. I was able to plug the Flash drive a second time.
(I am logging config space reads and writes.)

[    1.098314]   READ: bus=1 devfn=0 where=84 size=2 val=0x8
[    1.103779]   READ: bus=1 devfn=0 where=4 size=2 val=0x142
[    1.109315]   READ: bus=1 devfn=0 where=61 size=1 val=0x1
[    1.114746]   READ: bus=1 devfn=0 where=4 size=2 val=0x142
[    1.120311]   READ: bus=1 devfn=0 where=4 size=2 val=0x142
[    1.125841]  WRITE: bus=1 devfn=0 where=4 size=2 val=0x146

NB: I added msleep(2500) in usb_add_hcd()

[    3.681867] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    3.687154] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1
[    3.694656]   READ: bus=1 devfn=0 where=96 size=1 val=0x30
[    3.705736] xhci_hcd 0000:01:00.0: hcc params 0x014051cf hci version 0x100 quirks 0x00000010
[    3.714233]   READ: bus=1 devfn=0 where=12 size=1 val=0x10
[    3.719752]   READ: bus=1 devfn=0 where=4 size=2 val=0x146
[    3.725269]  WRITE: bus=1 devfn=0 where=4 size=2 val=0x156
[    3.730794]   READ: bus=1 devfn=0 where=146 size=2 val=0x7
[    3.736314]   READ: bus=1 devfn=0 where=146 size=2 val=0x7
[    3.741835]  WRITE: bus=1 devfn=0 where=146 size=2 val=0x7
[    3.747354]   READ: bus=1 devfn=0 where=146 size=2 val=0x7
[    3.752871]   READ: bus=1 devfn=0 where=148 size=4 val=0x1000
[    3.758775]   READ: bus=1 devfn=0 where=146 size=2 val=0x7
[    3.764297]  WRITE: bus=1 devfn=0 where=146 size=2 val=0xc007
[    3.770108]   READ: bus=1 devfn=0 where=4 size=2 val=0x146
[    3.775626]  WRITE: bus=1 devfn=0 where=4 size=2 val=0x546
[    3.781146]   READ: bus=1 devfn=0 where=146 size=2 val=0xc007
[    3.786925]  WRITE: bus=1 devfn=0 where=146 size=2 val=0x8007
[    3.792919] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    3.799756] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    3.807021] usb usb1: Product: xHCI Host Controller
[    3.811933] usb usb1: Manufacturer: Linux 4.12.0-rc1 xhci-hcd
[    3.817713] usb usb1: SerialNumber: 0000:01:00.0
[    3.822773] hub 1-0:1.0: USB hub found
[    3.826598] hub 1-0:1.0: 4 ports detected

NB: I added msleep(2500) in usb_add_hcd()

[    6.455246] xhci_hcd 0000:01:00.0: xHCI Host Controller
[    6.460520] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
[    6.468028] usb usb2: We don't know the algorithms for LPM for this host, disabling LPM.
[    6.476236] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003
[    6.483068] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    6.490334] usb usb2: Product: xHCI Host Controller
[    6.495240] usb usb2: Manufacturer: Linux 4.12.0-rc1 xhci-hcd
[    6.501020] usb usb2: SerialNumber: 0000:01:00.0
[    6.505994] hub 2-0:1.0: USB hub found
[    6.509806] hub 2-0:1.0: 4 ports detected
[    6.514215] usbcore: registered new interface driver usb-storage
[    6.520313] Registering SWP/SWPB emulation handler
[    6.525541]   READ: bus=0 devfn=0 where=132 size=4 val=0x8001
[    6.531334]   READ: bus=0 devfn=0 where=6 size=2 val=0x4010
[    6.536955]   READ: bus=0 devfn=0 where=52 size=1 val=0x50
[    6.542484]   READ: bus=0 devfn=0 where=80 size=2 val=0x7805
[    6.548180]   READ: bus=0 devfn=0 where=120 size=2 val=0x8001
[    6.553969]   READ: bus=0 devfn=0 where=128 size=2 val=0x10
[    6.559584]   READ: bus=0 devfn=0 where=124 size=2 val=0x6008
[    6.565387]   READ: bus=1 devfn=0 where=164 size=4 val=0x8fc0
[    6.571167]   READ: bus=1 devfn=0 where=6 size=2 val=0x10
[    6.576609]   READ: bus=1 devfn=0 where=52 size=1 val=0x50
[    6.582129]   READ: bus=1 devfn=0 where=80 size=2 val=0x7001
[    6.587821]   READ: bus=1 devfn=0 where=112 size=2 val=0x9005
[    6.593601]   READ: bus=1 devfn=0 where=144 size=2 val=0xa011
[    6.599381]   READ: bus=1 devfn=0 where=160 size=2 val=0x10
[    6.604985]   READ: bus=1 devfn=0 where=84 size=2 val=0x8
[    6.623665] Freeing unused kernel memory: 9216K


PLUG #1
[   66.783559] usb 2-2: new SuperSpeed USB device number 2 using xhci_hcd
[   66.816910] usb 2-2: New USB device found, idVendor=0951, idProduct=1666
[   66.823661] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   66.830909] usb 2-2: Product: DataTraveler 3.0
[   66.835417] usb 2-2: Manufacturer: Kingston
[   66.839660] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA
[   66.848131] usb-storage 2-2:1.0: USB Mass Storage device detected
[   66.854584] scsi host0: usb-storage 2-2:1.0
[   67.869446] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[   67.878270] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[   67.886248] sd 0:0:0:0: [sda] Write Protect is off
[   67.891347] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[   67.902708]  sda: sda1
[   67.906372] sd 0:0:0:0: [sda] Attached SCSI removable disk


UNPLUG #1
[   71.697358]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   71.703572]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[   71.709170]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   71.715569] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   71.723632]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[   71.729470]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   71.735373]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   71.741013]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   71.746914]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   71.752552]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   71.758194] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   71.770008] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   71.778494] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   71.785358]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   71.791259]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   71.796897]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   71.802524] pcieport 0000:00:00.0: AER: Device recovery failed
[   72.451908]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.458120]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[   72.463717]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.470012]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.476221]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[   72.481819]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.488109]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.494319]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[   72.499916]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.506205]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.512415]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[   72.518011]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[   72.524263] xhci_hcd 0000:01:00.0: Cannot set link state.
[   72.529711] usb usb2-port2: cannot disable (err = -32)
[   72.534883] usb 2-2: USB disconnect, device number 2
[   72.540042] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   72.548365]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[   72.554264]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.560157]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.565778]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.571654]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.577273]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.582891] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   72.594705] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   72.603122] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   72.609955]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.615833]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.621441]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.627061] pcieport 0000:00:00.0: AER: Device recovery failed
[   72.632931] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   72.640984]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[   72.646769]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.652636]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.658245]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.664114]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.669722]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.675330] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   72.687142] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   72.695545] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   72.702376]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.708244]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.713856]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.719473] pcieport 0000:00:00.0: AER: Device recovery failed
[   72.725342] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   72.733394]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[   72.739178]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.745044]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.750653]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.756520]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.762128]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.767734] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   72.779548] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   72.787950] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   72.794781]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.800649]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.806258]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.811873] pcieport 0000:00:00.0: AER: Device recovery failed
[   72.817741] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[   72.825793]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[   72.831574]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.837442]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.843054]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.848922]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.854529]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.860137] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[   72.871951] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[   72.880353] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[   72.887184]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[   72.893051]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[   72.898660]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[   72.904273] pcieport 0000:00:00.0: AER: Device recovery failed


PLUG #2
[  165.860193] usb 2-2: new SuperSpeed USB device number 3 using xhci_hcd
[  165.893583] usb 2-2: New USB device found, idVendor=0951, idProduct=1666
[  165.900333] usb 2-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[  165.907515] usb 2-2: Product: DataTraveler 3.0
[  165.911989] usb 2-2: Manufacturer: Kingston
[  165.916198] usb 2-2: SerialNumber: 002618887865F0C0F8646BFA
[  165.924547] usb-storage 2-2:1.0: USB Mass Storage device detected
[  165.930970] scsi host0: usb-storage 2-2:1.0
[  166.962705] scsi 0:0:0:0: Direct-Access     Kingston DataTraveler 3.0      PQ: 0 ANSI: 6
[  166.971494] sd 0:0:0:0: [sda] 15109516 512-byte logical blocks: (7.74 GB/7.20 GiB)
[  166.979556] sd 0:0:0:0: [sda] Write Protect is off
[  166.984591] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[  166.995847] random: fast init done
[  166.999430]  sda: sda1
[  167.003039] sd 0:0:0:0: [sda] Attached SCSI removable disk


UNPLUG #2
[  171.918834]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  171.925046]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[  171.930645]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  171.936941] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[  171.945000]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[  171.950784]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  171.956656]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  171.962263]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  171.968134]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  171.973741]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  171.979354] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[  171.991164] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[  171.999597] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[  172.006429]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.012300]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.017908]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  172.023529] pcieport 0000:00:00.0: AER: Device recovery failed
[  172.675221]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.681432]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[  172.687030]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.693325]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.699536]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[  172.705133]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.711424]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.717633]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[  172.723230]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.729517]   READ: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.735726]   READ: bus=0 devfn=0 where=2100 size=4 val=0x0
[  172.741322]  WRITE: bus=0 devfn=0 where=2096 size=4 val=0x10000024
[  172.747574] xhci_hcd 0000:01:00.0: Cannot set link state.
[  172.753021] usb usb2-port2: cannot disable (err = -32)
[  172.758193] usb 2-2: USB disconnect, device number 3
[  172.763340] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[  172.771627]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[  172.777515]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.783408]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.789030]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.794907]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.800526]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  172.806146] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[  172.817960] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[  172.826375] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[  172.833208]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.839078]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.844685]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  172.850305] pcieport 0000:00:00.0: AER: Device recovery failed
[  172.856183] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[  172.864236]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[  172.870020]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.875889]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.881497]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.887365]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.892974]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  172.898582] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[  172.910393] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[  172.918796] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[  172.925627]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.931494]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.937107]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  172.942724] pcieport 0000:00:00.0: AER: Device recovery failed
[  172.948593] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[  172.956644]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[  172.962428]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.968295]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.973903]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  172.979771]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  172.985379]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  172.990985] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[  173.002799] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[  173.011202] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[  173.018033]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  173.023901]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  173.029510]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  173.035123] pcieport 0000:00:00.0: AER: Device recovery failed
[  173.040990] pcieport 0000:00:00.0: AER: Uncorrected (Non-Fatal) error received: id=0000
[  173.049042]   READ: bus=0 devfn=0 where=136 size=2 val=0x281f
[  173.054825]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  173.060693]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  173.066305]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  173.072173]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  173.077780]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  173.083388] pcieport 0000:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0000(Requester ID)
[  173.095202] pcieport 0000:00:00.0:   device [1105:0024] error status/mask=00004000/00000000
[  173.103605] pcieport 0000:00:00.0:    [14] Completion Timeout     (First)
[  173.110435]   READ: bus=0 devfn=0 where=2052 size=4 val=0x4000
[  173.116303]   READ: bus=0 devfn=0 where=2056 size=4 val=0x0
[  173.121911]   READ: bus=0 devfn=0 where=2072 size=4 val=0xe
[  173.127524] pcieport 0000:00:00.0: AER: Device recovery failed




NOTE BENE: these issues do not occur at all with a USB2 Flash drive.

[ 2093.564771] usb 1-2: new high-speed USB device number 2 using xhci_hcd
[ 2093.790646] usb 1-2: New USB device found, idVendor=058f, idProduct=6387
[ 2093.797397] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2093.804583] usb 1-2: Product: Mass Storage
[ 2093.808707] usb 1-2: Manufacturer: Generic
[ 2093.812829] usb 1-2: SerialNumber: 31A69E70
[ 2093.819244] usb-storage 1-2:1.0: USB Mass Storage device detected
[ 2093.825624] scsi host0: usb-storage 1-2:1.0
[ 2094.856918] scsi 0:0:0:0: Direct-Access     Generic  Flash Disk       8.07 PQ: 0 ANSI: 2
[ 2094.866196] sd 0:0:0:0: [sda] 4106240 512-byte logical blocks: (2.10 GB/1.96 GiB)
[ 2094.874232] sd 0:0:0:0: [sda] Write Protect is off
[ 2094.879350] sd 0:0:0:0: [sda] No Caching mode page found
[ 2094.884816] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 2094.909111]  sda: sda1
[ 2094.912935] sd 0:0:0:0: [sda] Attached SCSI removable disk

[ 2100.516396] usb 1-2: USB disconnect, device number 2


Regards.
diff mbox

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 51cd4b8..ade2ad6 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -922,7 +922,8 @@  void xhci_hc_died(struct xhci_hcd *xhci)
         if (xhci->xhc_state & XHCI_STATE_DYING)
                 return;
  
-       xhci_err(xhci, "xHCI host controller not responding, assume dead\n");
+       xhci_err(xhci, "xHC not responding in %pf, assume controller is dead\n",
+                __builtin_return_address(0));
         xhci->xhc_state |= XHCI_STATE_DYING;
  
         xhci_cleanup_command_queue(xhci);