diff mbox series

[v3] usb: warm-reset ports on hub resume, if requested

Message ID 1548851905-1570-1-git-send-email-glogow@fbihome.de (mailing list archive)
State New, archived
Headers show
Series [v3] usb: warm-reset ports on hub resume, if requested | expand

Commit Message

Jan-Marek Glogowski Jan. 30, 2019, 12:38 p.m. UTC
On plug-in of my USB-C device, its USB_SS_PORT_LS_SS_INACTIVE
link state bit is set. Greping all the kernel for this bit shows
that the port status requests a warm-reset this way.

This just happens, if its the only device on that hub and the
hub resumes, so we don't call port_event, which would otherwise
warm-reset ports. The device works ok without this patch, if
there is already any other device connected to the hub.

Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
---

The original thread is "USB-C storage device not detected on USB 3.1 Gen 2
host when plugged in after boot". A different patch, suggested by Mathias
Nyman, didn't work for me. This patch was just rebased on usb-next, but not
re-tested. Original tests are based on 5.0-rc.

v1: This always warm-resets the ports in hub_activate, independent of the
"enum hub_activation_type". Just had a single device to test.

v2: I had the idea about the working device, if there is already a device
connected to the hub and that a resume only on "type == HUB_RESUME" should
be sufficient. This still works for me, but I didn't follow all the
hub_activate callers everywhere and I'm definitly still missing a lot of
knowledge about USB stuff. There is also HUB_RESET_RESUME with a slightly
different code path. I don't know how to trigger this.

v3: code unchanged to v2.

---
 drivers/usb/core/hub.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

Comments

Mathias Nyman Jan. 30, 2019, 2:58 p.m. UTC | #1
On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
> On plug-in of my USB-C device, its USB_SS_PORT_LS_SS_INACTIVE
> link state bit is set. Greping all the kernel for this bit shows
> that the port status requests a warm-reset this way.
> 
> This just happens, if its the only device on that hub and the
> hub resumes, so we don't call port_event, which would otherwise
> warm-reset ports. The device works ok without this patch, if
> there is already any other device connected to the hub.
> 
> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
> ---
> 
> The original thread is "USB-C storage device not detected on USB 3.1 Gen 2
> host when plugged in after boot". A different patch, suggested by Mathias
> Nyman, didn't work for me. This patch was just rebased on usb-next, but not
> re-tested. Original tests are based on 5.0-rc.
> 

I started to look into this in more detail, disregard the code I sent earlier, it's
not relevant.

I think your patch could be a solution, but it worries me that your logs show hub_event
evt (hub->event_bits) are zero even if port status 0x4002c0 should set a bit there.

 From you earlier log:

[  116.032422] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 19
[  116.032432] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: resume root hub
[  116.032447] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[  116.032510] usbcore:usb_remote_wakeup: usb usb3: usb wakeup-resume
[  116.032518] usbcore:hcd_bus_resume: usb usb3: usb auto-resume
[  116.032538] usbcore:hub_resume: hub 3-0:1.0: hub_resume
[  116.032567] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
[  116.032571] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  116.032602] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
[  116.032606] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  116.032624] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x4002c0
[  116.032633] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[  116.032654] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2a0
[  116.032658] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  116.032675] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
[  116.032679] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  116.032697] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
[  116.032701] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  116.032766] usbcore:hub_event: hub 3-0:1.0: state 7 ports 6 chg 0000 evt 0000
[  116.032784] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
[  116.032807] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 1 status  = 0xe0002a0
[  116.032829] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 2 status  = 0xe4002c0
[  116.032846] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 3 status  = 0xe0002a0
[  116.032870] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 4 status  = 0xe0002a0
[  116.032892] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 5 status  = 0xe0002a0
[  116.032906] usbcore:hub_suspend: hub 3-0:1.0: hub_suspend
[  116.032925] usbcore:hcd_bus_suspend: usb usb3: bus auto-suspend, wakeup 1
[  116.032946] usbcore:hcd_bus_suspend: usb usb3: suspend raced with wakeup event
[  116.032950] usbcore:hcd_bus_resume: usb usb3: usb auto-resume


The hub->event bits are set when roothub polling timer function gets data from
hub_status_data(), writes the data to a status urb, and returns the urb.

Normal operation:
usb_hcd_poll_rh_status()        // roothub polling timer function
   hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
     xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
   if (buffer && urb)
     memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
     usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
       hub_irq()
         hub->event_bits = urb->transfer_buffer //
   mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer

I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.

This could be possible if roothub is polled before properly resumed, then the URB wouldn't
be available yet, and no hub->event_bits are set.
Because no bits are set the hub starts suspending again after resume, probably before roothub
timer polls roothub again.
suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
it start resuming again, and continues this in a loop.

Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
into the future because as a part of suspend attempt/resume loop.

Could you add one more line of debugging and capture new logs:

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 015b126..6d400f9 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -759,6 +759,10 @@ void usb_hcd_poll_rh_status(struct usb_hcd *hcd)
                 return;
  
         length = hcd->driver->hub_status_data(hcd, buffer);
+
+       dev_err(&hcd->self.root_hub->dev, "%s, length %d, hcd->flags 0x%lx, status_urb %s\n",
+               __func__, length, hcd->flags, hcd->status_urb ? "Exists" : "NULL" );
+
         if (length > 0) {
  
                 /* try to complete the status urb */

Thanks
-Mathias
Jan-Marek Glogowski Jan. 30, 2019, 4:56 p.m. UTC | #2
Am 30.01.19 um 15:58 schrieb Mathias Nyman:
> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
>> On plug-in of my USB-C device, its USB_SS_PORT_LS_SS_INACTIVE
>> link state bit is set. Greping all the kernel for this bit shows
>> that the port status requests a warm-reset this way.
>>
>> This just happens, if its the only device on that hub and the
>> hub resumes, so we don't call port_event, which would otherwise
>> warm-reset ports. The device works ok without this patch, if
>> there is already any other device connected to the hub.
>>
>> Signed-off-by: Jan-Marek Glogowski <glogow@fbihome.de>
>> ---
>>
>> The original thread is "USB-C storage device not detected on USB 3.1 Gen 2
>> host when plugged in after boot". A different patch, suggested by Mathias
>> Nyman, didn't work for me. This patch was just rebased on usb-next, but not
>> re-tested. Original tests are based on 5.0-rc.
>>
> 
> I started to look into this in more detail, disregard the code I sent earlier, it's
> not relevant.
> 
> I think your patch could be a solution, but it worries me that your logs show hub_event
> evt (hub->event_bits) are zero even if port status 0x4002c0 should set a bit there.

I saw this too (chg 0000 evt 0000). I tried following the change_bits and event_bits
handling and fix it somewhere else, but then realized the port_event was never triggered,
which I expected. And also not the bits handling code in hub_activate.

> From you earlier log:
> 
> [  116.032422] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 19
> [  116.032432] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: resume root hub
> [  116.032447] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
> [  116.032510] usbcore:usb_remote_wakeup: usb usb3: usb wakeup-resume
> [  116.032518] usbcore:hcd_bus_resume: usb usb3: usb auto-resume
> [  116.032538] usbcore:hub_resume: hub 3-0:1.0: hub_resume
> [  116.032567] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
> [  116.032571] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
> [  116.032602] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
> [  116.032606] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
> [  116.032624] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x4002c0
> [  116.032633] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
> [  116.032654] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2a0
> [  116.032658] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
> [  116.032675] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
> [  116.032679] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
> [  116.032697] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
> [  116.032701] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
> [  116.032766] usbcore:hub_event: hub 3-0:1.0: state 7 ports 6 chg 0000 evt 0000
> [  116.032784] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
> [  116.032807] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 1 status  = 0xe0002a0
> [  116.032829] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 2 status  = 0xe4002c0
> [  116.032846] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 3 status  = 0xe0002a0
> [  116.032870] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 4 status  = 0xe0002a0
> [  116.032892] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 5 status  = 0xe0002a0
> [  116.032906] usbcore:hub_suspend: hub 3-0:1.0: hub_suspend
> [  116.032925] usbcore:hcd_bus_suspend: usb usb3: bus auto-suspend, wakeup 1
> [  116.032946] usbcore:hcd_bus_suspend: usb usb3: suspend raced with wakeup event
> [  116.032950] usbcore:hcd_bus_resume: usb usb3: usb auto-resume
> 
> 
> The hub->event bits are set when roothub polling timer function gets data from
> hub_status_data(), writes the data to a status urb, and returns the urb.
> 
> Normal operation:
> usb_hcd_poll_rh_status()        // roothub polling timer function
>   hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
>     xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
>   if (buffer && urb)
>     memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
>     usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
>       hub_irq()
>         hub->event_bits = urb->transfer_buffer //
>   mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer
> 
> I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.
> 
> This could be possible if roothub is polled before properly resumed, then the URB wouldn't
> be available yet, and no hub->event_bits are set.
> Because no bits are set the hub starts suspending again after resume, probably before roothub
> timer polls roothub again.
> suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
> it start resuming again, and continues this in a loop.
> 
> Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
> into the future because as a part of suspend attempt/resume loop.
> 
> Could you add one more line of debugging and capture new logs:
> 
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 015b126..6d400f9 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -759,6 +759,10 @@ void usb_hcd_poll_rh_status(struct usb_hcd *hcd)
>                 return;
>  
>         length = hcd->driver->hub_status_data(hcd, buffer);
> +
> +       dev_err(&hcd->self.root_hub->dev, "%s, length %d, hcd->flags 0x%lx, status_urb %s\n",
> +               __func__, length, hcd->flags, hcd->status_urb ? "Exists" : "NULL" );
> +
>         if (length > 0) {
>  
>                 /* try to complete the status urb */

This is just the plug code. I skipped the module load code, as there was no additional output.

[  485.284810] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 19
[  485.284820] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: resume root hub
[  485.284835] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[  485.284850] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x195, status_urb NULL
[  485.284909] usbcore:usb_remote_wakeup: usb usb2: usb wakeup-resume
[  485.284917] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[  485.284937] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[  485.284950] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
[  485.284961] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.284995] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
[  485.285004] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.285020] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x4002c0
[  485.285025] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[  485.285040] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2a0
[  485.285049] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.285069] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
[  485.285074] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.285088] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
[  485.285092] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.285116] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[  485.285132] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
[  485.285154] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 1 status  = 0xe0002a0
[  485.285171] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 2 status  = 0xe4002c0
[  485.285192] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 3 status  = 0xe0002a0
[  485.285209] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 4 status  = 0xe0002a0
[  485.285230] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 5 status  = 0xe0002a0
[  485.285243] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
[  485.285259] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
[  485.285279] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
[  485.285283] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[  485.285774] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x18d, status_urb NULL
[  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
[  485.301823] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.301846] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
[  485.301850] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.301863] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x4002c0
[  485.301867] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[  485.301881] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2a0
[  485.301885] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.301896] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
[  485.301900] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.301913] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
[  485.301916] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
[  485.301968] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 1 status  = 0xe0002a0
[  485.301982] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 2 status  = 0xe4002c0
[  485.301996] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 3 status  = 0xe0002a0
[  485.302010] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 4 status  = 0xe0002a0
[  485.302024] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 5 status  = 0xe0002a0
[  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
[  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
[  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
[  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x18d, status_urb NULL
...

Jan-Marek

P.S. I'm currently hanging out as jmux in #kernelnewbies, if IRC would help.
Jan-Marek Glogowski Jan. 31, 2019, 10:42 a.m. UTC | #3
Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
> Am 30.01.19 um 15:58 schrieb Mathias Nyman:
>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
>>
>> The hub->event bits are set when roothub polling timer function gets data from
>> hub_status_data(), writes the data to a status urb, and returns the urb.
>>
>> Normal operation:
>> usb_hcd_poll_rh_status()        // roothub polling timer function
>>   hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
>>     xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
>>   if (buffer && urb)
>>     memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
>>     usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
>>       hub_irq()
>>         hub->event_bits = urb->transfer_buffer //
>>   mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer
>>
>> I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.
>>
>> This could be possible if roothub is polled before properly resumed, then the URB wouldn't
>> be available yet, and no hub->event_bits are set.
>> Because no bits are set the hub starts suspending again after resume, probably before roothub
>> timer polls roothub again.
>> suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
>> it start resuming again, and continues this in a loop.
>>
>> Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
>> into the future because as a part of suspend attempt/resume loop.

So as you expected the status_urb is always NULL.

I tried to understand more of the code by adding some more debug message.

I looked at usb_hcd_resume_root_hub, which is called when we see the "resume root hub" dmesg.
This just queues the hcd->wakeup_work, so indeed, if usb_hcd_poll_rh_status is later called,
the hub might not have resumed yet and polling starts. So my simple approach was to add a
function to call flush_work(&hcd->wakeup_work) before usb_hcd_poll_rh_status. All variants I
tested killed the kernel. I also tried to add a udelay to no avail.

So I just added the usb_hcd_poll_rh_status to the hcd->wakeup_work, if we resume the hub,
which works for me, and results in the following patch:

diff --git a/core/hcd.c b/core/hcd.c
index 015b126..06a3bf6 100644
--- a/core/hcd.c
+++ b/core/hcd.c
@@ -759,6 +759,10 @@ void usb_hcd_poll_rh_status(struct usb_hcd *hcd)
 		return;

 	length = hcd->driver->hub_status_data(hcd, buffer);
+
+	dev_err(&hcd->self.root_hub->dev, "%s, length %d, hcd->flags 0x%lx, status_urb %s\n",
+		__func__, length, hcd->flags, hcd->status_urb ? "Exists" : "NULL" );
+
 	if (length > 0) {

 		/* try to complete the status urb */
@@ -2347,6 +2351,7 @@ static void hcd_resume_work(struct work_struct *work)
 	struct usb_device *udev = hcd->self.root_hub;

 	usb_remote_wakeup(udev);
+	usb_hcd_poll_rh_status(hcd);
 }

 /**
diff --git a/core/hub.c b/core/hub.c
index 1d1e61e..cb41cc9 100644
--- a/core/hub.c
+++ b/core/hub.c
@@ -3563,7 +3563,7 @@ int usb_remote_wakeup(struct usb_device *udev)
 	if (udev->state == USB_STATE_SUSPENDED) {
 		dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
 		status = usb_autoresume_device(udev);
-		if (status == 0) {
+		if (status != 0) {
 			/* Let the drivers do their thing, then... */
 			usb_autosuspend_device(udev);
 		}
diff --git a/xhci-ring.c b/xhci-ring.c
index 40fa25c..1f8b8c3 100644
--- a/xhci-ring.c
+++ b/xhci-ring.c
@@ -1562,6 +1562,7 @@ static void handle_port_status(struct xhci_hcd *xhci,
 	struct xhci_bus_state *bus_state;
 	bool bogus_port_status = false;
 	struct xhci_port *port;
+	bool needs_root_hub_resume;

 	/* Port status change events always have a successful completion code */
 	if (GET_COMP_CODE(le32_to_cpu(event->generic.field[2])) != COMP_SUCCESS)
@@ -1599,7 +1600,8 @@ static void handle_port_status(struct xhci_hcd *xhci,

 	trace_xhci_handle_port_status(hcd_portnum, portsc);

-	if (hcd->state == HC_STATE_SUSPENDED) {
+	needs_root_hub_resume = hcd->state == HC_STATE_SUSPENDED;
+	if (needs_root_hub_resume) {
 		xhci_dbg(xhci, "resume root hub\n");
 		usb_hcd_resume_root_hub(hcd);
 	}
@@ -1712,7 +1714,8 @@ cleanup:
 	set_bit(HCD_FLAG_POLL_RH, &hcd->flags);
 	spin_unlock(&xhci->lock);
 	/* Pass this up to the core */
-	usb_hcd_poll_rh_status(hcd);
+	if (!needs_root_hub_resume)
+		usb_hcd_poll_rh_status(hcd);
 	spin_lock(&xhci->lock);
 }

Which gets me the following dmesg output after the plug:

[ 2565.543348] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 19
[ 2565.543358] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: resume root hub
[ 2565.543375] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[ 2565.543432] usbcore:usb_remote_wakeup: usb usb2: usb wakeup-resume
[ 2565.543440] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[ 2565.543461] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[ 2565.543478] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
[ 2565.543488] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[ 2565.543526] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
[ 2565.543530] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[ 2565.543546] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x4002c0
[ 2565.543550] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[ 2565.543566] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2a0
[ 2565.543573] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[ 2565.543591] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
[ 2565.543595] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[ 2565.543610] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
[ 2565.543614] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[ 2565.543653] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x1a5, status_urb Exists
[ 2565.543663] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[ 2565.543680] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0008
[ 2565.543696] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x4002c0
[ 2565.543705] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[ 2565.543718] usbcore:port_event: usb usb2-port3: link state change
[ 2565.543732] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port link state change, actual port 2 status  = 0x2c0
[ 2565.543744] usbcore:port_event: usb usb2-port3: do warm reset
[ 2565.602302] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x2b0
[ 2565.602308] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2b0
[ 2565.602338] usbcore:hub_port_wait_reset: usb usb2-port3: not warm reset yet, waiting 50ms
[ 2565.649226] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 19
[ 2565.649236] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[ 2565.649251] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x1a5, status_urb Exists
[ 2565.658200] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x1a5, status_urb Exists
[ 2565.662258] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x2a1203
[ 2565.662264] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x310203
[ 2565.662305] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port reset change, actual port 2 status  = 0xa1203
[ 2565.662323] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port warm(BH) reset change, actual port 2 status  = 0x21203
[ 2565.662340] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port link state change, actual port 2 status  = 0x21203
[ 2565.662354] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x21203
[ 2565.662358] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x10203
[ 2565.722034] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0008
[ 2565.722049] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x21203
[ 2565.722054] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x10203
[ 2565.722080] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port connect change, actual port 2 status  = 0x1203
[ 2565.722092] usbcore:hub_port_connect_change: usb usb2-port3: status 0203, change 0001, 10.0 Gb/s
[ 2565.722099] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.722103] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.758232] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.758237] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.794301] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.794307] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.830243] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.830249] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.866217] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.866223] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.866249] usbcore:hub_port_debounce: usb usb2-port3: debounce total 100ms stable 100ms status 0x203
[ 2565.866260] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2565.866321] xhci_hcd:xhci_alloc_virt_device: xhci_hcd 0000:00:14.0: Slot 1 output ctx = 0x44f6d4000 (dma)
[ 2565.866328] xhci_hcd:xhci_alloc_virt_device: xhci_hcd 0000:00:14.0: Slot 1 input ctx = 0x454bed000 (dma)
[ 2565.866340] xhci_hcd:xhci_alloc_virt_device: xhci_hcd 0000:00:14.0: Set slot id 1 dcbaa entry 00000000bb5769a7 to 0x44f6d4000
[ 2565.866385] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.866390] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.866409] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port reset, actual port 2 status  = 0x1311
[ 2565.866417] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 19
[ 2565.866423] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[ 2565.866435] usb usb2: usb_hcd_poll_rh_status, length 0, hcd->flags 0x1a5, status_urb Exists
[ 2565.906199] usb usb2: usb_hcd_poll_rh_status, length 0, hcd->flags 0x1a5, status_urb Exists
[ 2565.934213] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x201203
[ 2565.934219] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x100203
[ 2565.934251] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port reset change, actual port 2 status  = 0x1203
[ 2565.934268] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port warm(BH) reset change, actual port 2 status  = 0x1203
[ 2565.934286] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port link state change, actual port 2 status  = 0x1203
[ 2565.934300] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port connect change, actual port 2 status  = 0x1203
[ 2565.934314] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x1203
[ 2565.934318] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[ 2565.994227] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: Set root hub portnum to 19
[ 2565.994233] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: Set fake root hub portnum to 3
[ 2565.994238] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: udev->tt =           (null)
[ 2565.994243] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: udev->ttport = 0x0
[ 2565.994250] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2565.994360] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful setup address command
[ 2565.994370] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Op regs DCBAA ptr = 0x000004582b7000
[ 2565.994378] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Slot ID 1 dcbaa entry @00000000bb5769a7 = 0x0000044f6d4000
[ 2565.994383] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Output Context DMA address = 0x44f6d4000
[ 2565.994388] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Internal device address = 0
[ 2565.994398] usb 2-3: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 2566.014881] usbcore:usb_parse_endpoint: usb 2-3: skipped 1 descriptor after endpoint
[ 2566.014887] usbcore:usb_parse_endpoint: usb 2-3: skipped 1 descriptor after endpoint
[ 2566.014918] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[ 2566.014964] usbcore:usb_get_langid: usb 2-3: default language 0x0409
[ 2566.014996] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[ 2566.015075] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[ 2566.015155] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[ 2566.015204] usbcore:usb_new_device: usb 2-3: udev 2, busnum 2, minor = 129
[ 2566.015210] usb 2-3: New USB device found, idVendor=0781, idProduct=5596, bcdDevice= 1.00
[ 2566.015214] usb 2-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 2566.015217] usb 2-3: Product: Ultra T C
[ 2566.015221] usb 2-3: Manufacturer: SanDisk
[ 2566.015224] usb 2-3: SerialNumber: 4C530001090830111403
[ 2566.015452] usbcore:usb_probe_device: usb 2-3: usb_probe_device
[ 2566.015458] usbcore:usb_choose_configuration: usb 2-3: configuration #1 chosen from 1 choice
[ 2566.015492] xhci_hcd:xhci_add_endpoint: xhci_hcd 0000:00:14.0: add ep 0x81, slot id 1, new drop flags = 0x0, new add flags = 0x8
[ 2566.015513] xhci_hcd:xhci_add_endpoint: xhci_hcd 0000:00:14.0: add ep 0x2, slot id 1, new drop flags = 0x0, new add flags = 0x18
[ 2566.015519] xhci_hcd:xhci_check_bandwidth: xhci_hcd 0000:00:14.0: xhci_check_bandwidth called for udev 000000008c40d767
[ 2566.015526] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.015667] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Endpoint Configure command
[ 2566.015677] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.015698] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stopped on No-op or Link TRB for slot 1 ep 2
[ 2566.015708] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.015848] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.015871] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stopped on No-op or Link TRB for slot 1 ep 3
[ 2566.015881] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.016154] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set up evaluate context for LPM MEL change.
[ 2566.016159] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.016186] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful evaluate context command
[ 2566.016292] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set up evaluate context for LPM MEL change.
[ 2566.016297] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2566.016331] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful evaluate context command
[ 2566.016389] usbcore:usb_set_configuration: usb 2-3: adding 2-3:1.0 (config #1, interface 0)
[ 2566.046707] usbcore:usb_probe_interface: usb-storage 2-3:1.0: usb_probe_interface
[ 2566.046718] usbcore:usb_probe_interface: usb-storage 2-3:1.0: usb_probe_interface - got id
[ 2566.046723] usb-storage 2-3:1.0: USB Mass Storage device detected
[ 2566.047790] scsi host6: usb-storage 2-3:1.0
[ 2566.047974] usbcore: registered new interface driver usb-storage
[ 2566.051272] usbcore: registered new interface driver uas
[ 2566.154204] xhci_hcd:xhci_hub_status_data: xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.
[ 2566.154213] usb usb2: usb_hcd_poll_rh_status, length 0, hcd->flags 0x1a1, status_urb Exists
[ 2567.063435] scsi 6:0:0:0: Direct-Access     SanDisk  Ultra T C        1.00 PQ: 0 ANSI: 6
[ 2567.064071] sd 6:0:0:0: Attached scsi generic sg1 type 0
[ 2567.064276] sd 6:0:0:0: [sdb] 60063744 512-byte logical blocks: (30.8 GB/28.6 GiB)
[ 2567.064701] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 192 bytes, 124 bytes untransferred
[ 2567.064710] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000005b7fd236, len = 68, expected = 192, status = 0
[ 2567.064941] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.064949] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.064954] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.064959] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.064965] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.064970] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f2070 (DMA)
[ 2567.064974] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.064982] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 0000000092dfa1c2 (0x4587f2070 dma), new cycle = 1
[ 2567.064986] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.064995] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.065006] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.065013] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f2070
[ 2567.065184] sd 6:0:0:0: [sdb] Write Protect is off
[ 2567.065189] sd 6:0:0:0: [sdb] Mode Sense: 43 00 00 00
[ 2567.065297] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 192 bytes, 124 bytes untransferred
[ 2567.065304] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 00000000b751a34e, len = 68, expected = 192, status = 0
[ 2567.065334] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.065341] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.065346] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.065350] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.065355] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.065360] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f20a0 (DMA)
[ 2567.065364] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.065371] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 00000000752c9e76 (0x4587f20a0 dma), new cycle = 1
[ 2567.065376] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.065384] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.065391] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.065397] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f20a0
[ 2567.065504] sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 2567.066508] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 96 bytes, 68 bytes untransferred
[ 2567.066515] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000006b0219e8, len = 28, expected = 96, status = 0
[ 2567.066558] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.066564] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.066569] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.066573] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.066578] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.066583] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f20f0 (DMA)
[ 2567.066587] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.066594] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 00000000f42a488e (0x4587f20f0 dma), new cycle = 1
[ 2567.066598] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.066606] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.066615] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.066621] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f20f0
[ 2567.068530] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 192 bytes, 124 bytes untransferred
[ 2567.068538] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 00000000af01d088, len = 68, expected = 192, status = 0
[ 2567.068609] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.068617] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.068621] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.068627] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.068632] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.068637] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f2150 (DMA)
[ 2567.068653] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.068654] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 00000000419053a2 (0x4587f2150 dma), new cycle = 1
[ 2567.068655] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.068659] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.068661] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.068666] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f2150
[ 2567.068982] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 192 bytes, 124 bytes untransferred
[ 2567.068984] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 00000000851894f8, len = 68, expected = 192, status = 0
[ 2567.069032] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.069034] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.069035] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.069036] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.069037] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.069039] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f2180 (DMA)
[ 2567.069039] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.069041] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 00000000bcfc19be (0x4587f2180 dma), new cycle = 1
[ 2567.069042] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.069047] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.069049] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.069074] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f2180
[ 2567.081299]  sdb: sdb1
[ 2567.082144] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 192 bytes, 124 bytes untransferred
[ 2567.082147] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 0000000073ddcc93, len = 68, expected = 192, status = 0
[ 2567.082377] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.082380] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.082381] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.082383] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.082384] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.082386] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f2200 (DMA)
[ 2567.082387] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.082389] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 000000006f709260 (0x4587f2200 dma), new cycle = 1
[ 2567.082391] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.082395] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.082398] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.082400] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f2200
[ 2567.082595] xhci_hcd:process_bulk_intr_td: xhci_hcd 0000:00:14.0: ep 0x81 - asked for 192 bytes, 124 bytes untransferred
[ 2567.082597] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 00000000b751a34e, len = 68, expected = 192, status = 0
[ 2567.082634] xhci_hcd:handle_tx_event: xhci_hcd 0000:00:14.0: Stalled endpoint for slot 1 ep 2
[ 2567.082636] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cleaning up stalled endpoint ring
[ 2567.082637] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Finding endpoint context
[ 2567.082639] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Cycle state = 0x1
[ 2567.082640] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue segment = 000000006c31f892 (virtual)
[ 2567.082641] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: New dequeue pointer = 0x4587f2230 (DMA)
[ 2567.082643] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Queueing new dequeue state
[ 2567.082645] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Set TR Deq Ptr cmd, new deq seg = 000000006c31f892 (0x4587f2000 dma), new deq ptr = 000000007e2dd3a1 (0x4587f2230 dma), new cycle = 1
[ 2567.082646] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[ 2567.082650] xhci_hcd:xhci_td_cleanup: xhci_hcd 0000:00:14.0: Giveback URB 000000007c755c16, len = 0, expected = 13, status = -32
[ 2567.082653] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Ignoring reset ep completion code of 1
[ 2567.082676] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful Set TR Deq Ptr cmd, deq = @4587f2230
[ 2567.082790] sd 6:0:0:0: [sdb] Attached SCSI removable disk

There is an additional auto-resume => auto-suspend cycle without:

diff --git a/core/hub.c b/core/hub.c
index 1d1e61e..cb41cc9 100644
--- a/core/hub.c
+++ b/core/hub.c
@@ -3563,7 +3563,7 @@ int usb_remote_wakeup(struct usb_device *udev)
 	if (udev->state == USB_STATE_SUSPENDED) {
 		dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
 		status = usb_autoresume_device(udev);
-		if (status == 0) {
+		if (status != 0) {
 			/* Let the drivers do their thing, then... */
 			usb_autosuspend_device(udev);
 		}

I don't understand why we auto-suspend directly after a successful auto-resume.

The hunk is from a huge patch:

commit 9bbdf1e0afe771ca7650f9f476769310bee9d8f3
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Fri Jan 8 12:57:28 2010 -0500

    USB: convert to the runtime PM framework

But probably there is a better way to wait for the status_urb.
I definitely need some comments on this.

Jan-Marek
Mathias Nyman Jan. 31, 2019, 2:56 p.m. UTC | #4
On 31.01.2019 12:42, Jan-Marek Glogowski wrote:
> Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
>> Am 30.01.19 um 15:58 schrieb Mathias Nyman:
>>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
>>>
>>> The hub->event bits are set when roothub polling timer function gets data from
>>> hub_status_data(), writes the data to a status urb, and returns the urb.
>>>
>>> Normal operation:
>>> usb_hcd_poll_rh_status()        // roothub polling timer function
>>>    hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
>>>      xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
>>>    if (buffer && urb)
>>>      memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
>>>      usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
>>>        hub_irq()
>>>          hub->event_bits = urb->transfer_buffer //
>>>    mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer
>>>
>>> I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.
>>>
>>> This could be possible if roothub is polled before properly resumed, then the URB wouldn't
>>> be available yet, and no hub->event_bits are set.
>>> Because no bits are set the hub starts suspending again after resume, probably before roothub
>>> timer polls roothub again.
>>> suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
>>> it start resuming again, and continues this in a loop.
>>>
>>> Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
>>> into the future because as a part of suspend attempt/resume loop.
> 
> So as you expected the status_urb is always NULL.
> 
> I tried to understand more of the code by adding some more debug message.
> 
> I looked at usb_hcd_resume_root_hub, which is called when we see the "resume root hub" dmesg.
> This just queues the hcd->wakeup_work, so indeed, if usb_hcd_poll_rh_status is later called,
> the hub might not have resumed yet and polling starts. So my simple approach was to add a
> function to call flush_work(&hcd->wakeup_work) before usb_hcd_poll_rh_status. All variants I
> tested killed the kernel. I also tried to add a udelay to no avail.

I think the idea is that if usb_hcd_poll_rh_status() finds a port that needs attention, but
the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set.

This flag is checked in resume when submitting the hub URB, and idea is to react on it:

hub_activate(HUB_RESUME)
     usb_submit_urb(hub->urb)
      rh_urb_enqueue()
        rh_queue_status()
           hcd->status_urb = urb;
           if (HCD_POLL_PENDING(hcd)
              mod_timer(&hcd->rh_timer, jiffies);

It tries to trigger usb_hcd_poll_rh_status() immediately by modifying the timer,
but from your logs it seems that it's not fast enough:

[  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
  ...
[  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
...
[  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
[  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
[  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
[  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x18d, status_urb NULL

hcd->flags 0x18d means HCD_FLAG_POLL_PENDING  was set, but hub_event() workqueue and hub_suspend()
are run before polling the root hub, and I guess the URB gets removed at some point during suspsend.

This then goes on in a loop.

You could try running usb_hcd_poll_rh_status() direcly instead of just kicking the timer and see if
it works:

@@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
                 mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) * (HZ/4));
  
         /* If a status change has already occurred, report it ASAP */
-       else if (HCD_POLL_PENDING(hcd))
-               mod_timer(&hcd->rh_timer, jiffies);
+       else if (HCD_POLL_PENDING(hcd)) {
+               spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
+               usb_hcd_poll_rh_status(hcd);
+               return 0;
+       }
         retval = 0;
   done:
         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);


I didn't check locking or running contexts if it's ok to do this, there's could be a reason
why it wasn't done like this in the first place.
Worth a shot still.

-Mathias
Jan-Marek Glogowski Jan. 31, 2019, 3:13 p.m. UTC | #5
Am 31.01.19 um 15:56 schrieb Mathias Nyman:
> On 31.01.2019 12:42, Jan-Marek Glogowski wrote:
>> Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
>>> Am 30.01.19 um 15:58 schrieb Mathias Nyman:
>>>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
>>>>
>>>> The hub->event bits are set when roothub polling timer function gets data from
>>>> hub_status_data(), writes the data to a status urb, and returns the urb.
>>>>
>>>> Normal operation:
>>>> usb_hcd_poll_rh_status()        // roothub polling timer function
>>>>    hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
>>>>      xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
>>>>    if (buffer && urb)
>>>>      memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
>>>>      usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
>>>>        hub_irq()
>>>>          hub->event_bits = urb->transfer_buffer //
>>>>    mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer
>>>>
>>>> I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.
>>>>
>>>> This could be possible if roothub is polled before properly resumed, then the URB wouldn't
>>>> be available yet, and no hub->event_bits are set.
>>>> Because no bits are set the hub starts suspending again after resume, probably before roothub
>>>> timer polls roothub again.
>>>> suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
>>>> it start resuming again, and continues this in a loop.
>>>>
>>>> Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
>>>> into the future because as a part of suspend attempt/resume loop.
>>
>> So as you expected the status_urb is always NULL.
>>
>> I tried to understand more of the code by adding some more debug message.
>>
>> I looked at usb_hcd_resume_root_hub, which is called when we see the "resume root hub" dmesg.
>> This just queues the hcd->wakeup_work, so indeed, if usb_hcd_poll_rh_status is later called,
>> the hub might not have resumed yet and polling starts. So my simple approach was to add a
>> function to call flush_work(&hcd->wakeup_work) before usb_hcd_poll_rh_status. All variants I
>> tested killed the kernel. I also tried to add a udelay to no avail.
> 
> I think the idea is that if usb_hcd_poll_rh_status() finds a port that needs attention, but
> the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set.
> 
> This flag is checked in resume when submitting the hub URB, and idea is to react on it:
> 
> hub_activate(HUB_RESUME)
>     usb_submit_urb(hub->urb)
>      rh_urb_enqueue()
>        rh_queue_status()
>           hcd->status_urb = urb;
>           if (HCD_POLL_PENDING(hcd)
>              mod_timer(&hcd->rh_timer, jiffies);
> 
> It tries to trigger usb_hcd_poll_rh_status() immediately by modifying the timer,
> but from your logs it seems that it's not fast enough:
> 
> [  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
> [  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
>  ...
> [  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
> [  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
> ...
> [  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
> [  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
> [  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
> [  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
> [  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x18d, status_urb NULL
> 
> hcd->flags 0x18d means HCD_FLAG_POLL_PENDING  was set, but hub_event() workqueue and hub_suspend()
> are run before polling the root hub, and I guess the URB gets removed at some point during suspsend.
> 
> This then goes on in a loop.
> 
> You could try running usb_hcd_poll_rh_status() direcly instead of just kicking the timer and see if
> it works:
> 
> @@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
>                 mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) * (HZ/4));
>  
>         /* If a status change has already occurred, report it ASAP */
> -       else if (HCD_POLL_PENDING(hcd))
> -               mod_timer(&hcd->rh_timer, jiffies);
> +       else if (HCD_POLL_PENDING(hcd)) {
> +               spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> +               usb_hcd_poll_rh_status(hcd);
> +               return 0;
> +       }
>         retval = 0;
>   done:
>         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> 
> 
> I didn't check locking or running contexts if it's ok to do this, there's could be a reason
> why it wasn't done like this in the first place.
> Worth a shot still.

Yup - that also works. That still has the first status_urb NULL and one resume -> suspend race.
I just kept your dev_err in addition.

[18818.679289] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 20
[18818.679299] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: resume root hub
[18818.679314] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[18818.679329] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x195, status_urb NULL
[18818.679387] usbcore:usb_remote_wakeup: usb usb2: usb wakeup-resume
[18818.679395] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[18818.679416] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[18818.679428] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
[18818.679439] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.679476] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
[18818.679485] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.679499] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x2a0
[18818.679503] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.679518] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x4002c0
[18818.679522] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[18818.679538] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
[18818.679546] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.679561] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
[18818.679565] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.679586] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x1ad, status_urb Exists
[18818.679602] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
[18818.679619] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
[18818.679645] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 1 status  = 0xe0002a0
[18818.679659] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0010
[18818.679677] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 2 status  = 0xe0002a0
[18818.679695] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 3 status  = 0xe4002c0
[18818.679712] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 4 status  = 0xe0002a0
[18818.679729] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 5 status  = 0xe0002a0
[18818.679744] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
[18818.679762] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
[18818.679784] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
[18818.679789] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
[18818.698146] usbcore:hub_resume: hub 2-0:1.0: hub_resume
[18818.698159] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
[18818.698164] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.698188] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 1 status  = 0x2a0
[18818.698192] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.698206] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 2 status  = 0x2a0
[18818.698210] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.698224] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x4002c0
[18818.698229] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[18818.698243] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 4 status  = 0x2a0
[18818.698247] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.698263] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 5 status  = 0x2a0
[18818.698268] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2a0
[18818.698294] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x4002c0
[18818.698298] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x4002c0
[18818.698310] usbcore:port_event: usb usb2-port4: link state change
[18818.698320] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port link state change, actual port 3 status  = 0x2c0
[18818.698330] usbcore:port_event: usb usb2-port4: do warm reset
[18818.758189] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2b0
[18818.758195] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x2b0
[18818.758222] usbcore:hub_port_wait_reset: usb usb2-port4: not warm reset yet, waiting 50ms
[18818.762303] xhci_hcd:xhci_hub_status_data: xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.
[18818.762312] usb usb2: usb_hcd_poll_rh_status, length 0, hcd->flags 0x1a1, status_urb Exists
[18818.804070] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 20
[18818.804080] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[18818.804095] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x1a5, status_urb Exists
[18818.818339] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x2a1203
[18818.818345] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x310203
[18818.818387] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port reset change, actual port 3 status  = 0xa1203
[18818.818404] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port warm(BH) reset change, actual port 3 status  = 0x21203
[18818.818422] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port link state change, actual port 3 status  = 0x21203
[18818.818435] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x21203
[18818.818439] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x10203
[18818.878337] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0010
[18818.878355] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x21203
[18818.878360] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x10203
[18818.878392] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port connect change, actual port 3 status  = 0x1203
[18818.878406] usbcore:hub_port_connect_change: usb usb2-port4: status 0203, change 0001, 10.0 Gb/s
[18818.878414] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18818.878418] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18818.914324] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18818.914330] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18818.950175] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18818.950181] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18818.986348] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18818.986354] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18819.010300] xhci_hcd:xhci_hub_status_data: xhci_hcd 0000:00:14.0: xhci_hub_status_data: stopping port polling.
[18819.010309] usb usb2: usb_hcd_poll_rh_status, length 0, hcd->flags 0x1a1, status_urb Exists
[18819.022316] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18819.022322] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18819.022349] usbcore:hub_port_debounce: usb usb2-port4: debounce total 100ms stable 100ms status 0x203
[18819.022360] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[18819.022457] xhci_hcd:xhci_alloc_virt_device: xhci_hcd 0000:00:14.0: Slot 1 output ctx = 0x44ea02000 (dma)
[18819.022463] xhci_hcd:xhci_alloc_virt_device: xhci_hcd 0000:00:14.0: Slot 1 input ctx = 0x458568000 (dma)
[18819.022475] xhci_hcd:xhci_alloc_virt_device: xhci_hcd 0000:00:14.0: Set slot id 1 dcbaa entry 0000000095bf5995 to 0x44ea02000
[18819.022522] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18819.022526] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18819.022546] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port reset, actual port 3 status  = 0x1311
[18819.022554] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: Port Status Change Event for port 20
[18819.022559] xhci_hcd:handle_port_status: xhci_hcd 0000:00:14.0: handle_port_status: starting port polling.
[18819.022572] usb usb2: usb_hcd_poll_rh_status, length 0, hcd->flags 0x1a5, status_urb Exists
[18819.090325] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x201203
[18819.090331] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x100203
[18819.090366] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port reset change, actual port 3 status  = 0x1203
[18819.090382] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port warm(BH) reset change, actual port 3 status  = 0x1203
[18819.090401] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port link state change, actual port 3 status  = 0x1203
[18819.090415] xhci_hcd:xhci_clear_port_change_bit: xhci_hcd 0000:00:14.0: clear port connect change, actual port 3 status  = 0x1203
[18819.090430] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 3 status  = 0x1203
[18819.090434] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: Get port status returned 0x203
[18819.150399] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: Set root hub portnum to 20
[18819.150405] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: Set fake root hub portnum to 4
[18819.150410] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: udev->tt =           (null)
[18819.150415] xhci_hcd:xhci_setup_addressable_virt_dev: xhci_hcd 0000:00:14.0: udev->ttport = 0x0
[18819.150422] xhci_hcd:xhci_ring_cmd_db: xhci_hcd 0000:00:14.0: // Ding dong!
[18819.150479] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Successful setup address command
[18819.150488] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Op regs DCBAA ptr = 0x00000457c1b000
[18819.150495] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Slot ID 1 dcbaa entry @0000000095bf5995 = 0x0000044ea02000
[18819.150500] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Output Context DMA address = 0x44ea02000
[18819.150505] xhci_hcd:xhci_dbg_trace: xhci_hcd 0000:00:14.0: Internal device address = 0
[18819.150514] usb 2-4: new SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[18819.170820] usbcore:usb_parse_endpoint: usb 2-4: skipped 1 descriptor after endpoint
[18819.170826] usbcore:usb_parse_endpoint: usb 2-4: skipped 1 descriptor after endpoint
[18819.170858] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[18819.170904] usbcore:usb_get_langid: usb 2-4: default language 0x0409
[18819.170940] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[18819.171018] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[18819.171098] xhci_hcd:process_ctrl_td: xhci_hcd 0000:00:14.0: Waiting for status stage event
[18819.171147] usbcore:usb_new_device: usb 2-4: udev 2, busnum 2, minor = 129
[18819.171154] usb 2-4: New USB device found, idVendor=0781, idProduct=5596, bcdDevice= 1.00
[18819.171159] usb 2-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[18819.171163] usb 2-4: Product: Ultra T C
[18819.171166] usb 2-4: Manufacturer: SanDisk
[18819.171170] usb 2-4: SerialNumber: 4C530001090830111403
[18819.171397] usbcore:usb_probe_device: usb 2-4: usb_probe_device
....

So is your patch a general solution?
Can we get this as a bug fix into the kernel?

Jan-Marek
Alan Stern Jan. 31, 2019, 4:53 p.m. UTC | #6
This thread has been hard to follow.  I'll try to answer some of your
concerns below...

On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote:

> Am 31.01.19 um 15:56 schrieb Mathias Nyman:
> > On 31.01.2019 12:42, Jan-Marek Glogowski wrote:
> >> Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
> >>> Am 30.01.19 um 15:58 schrieb Mathias Nyman:
> >>>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
> >>>>
> >>>> The hub->event bits are set when roothub polling timer function gets data from
> >>>> hub_status_data(), writes the data to a status urb, and returns the urb.

You have been concentrating on what happens with root hubs 
(understandable, since that's where your problem occurs).  But in order 
to understand the code, you have to realize that it is meant to apply 
to _all_ hubs, both root and external.

> >>>> Normal operation:
> >>>> usb_hcd_poll_rh_status()        // roothub polling timer function
> >>>>    hcd->driver->hub_status_data(hcd, buffer)   -> xhci_hub_status_data()
> >>>>      xhci_hub_status_data()      //returns buffer with bits set if a port needs attention
> >>>>    if (buffer && urb)
> >>>>      memcpy(urb->transfer_buffer, buffer)       // copy buffer bits to urb
> >>>>      usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb complete callback function
> >>>>        hub_irq()
> >>>>          hub->event_bits = urb->transfer_buffer //
> >>>>    mod_timer(usb_hcd_poll_rh_status, future)    // reschedule roothub polling timer
> >>>>
> >>>> I'm guessing usb_hcd_poll_rh_status() never properly completes a URB in your case.

It's not just a question of a timer expiring.  If we were dealing with 
an external hub, the code would have to wait until an actual USB 
transfer had completed.  There's no way to avoid that wait.

> >>>> This could be possible if roothub is polled before properly resumed, then the URB wouldn't
> >>>> be available yet, and no hub->event_bits are set.
> >>>> Because no bits are set the hub starts suspending again after resume, probably before roothub
> >>>> timer polls roothub again.
> >>>> suspend fails when it manually checks hcd->driver->hub_status_data() for activity, and
> >>>> it start resuming again, and continues this in a loop.
> >>>>
> >>>> Either roothub polling is stopped at the suspend attempt, or maybe continuously rescheduled
> >>>> into the future because as a part of suspend attempt/resume loop.
> >>
> >> So as you expected the status_urb is always NULL.
> >>
> >> I tried to understand more of the code by adding some more debug message.
> >>
> >> I looked at usb_hcd_resume_root_hub, which is called when we see the "resume root hub" dmesg.
> >> This just queues the hcd->wakeup_work, so indeed, if usb_hcd_poll_rh_status is later called,
> >> the hub might not have resumed yet and polling starts. So my simple approach was to add a
> >> function to call flush_work(&hcd->wakeup_work) before usb_hcd_poll_rh_status. All variants I
> >> tested killed the kernel. I also tried to add a udelay to no avail.
> > 
> > I think the idea is that if usb_hcd_poll_rh_status() finds a port that needs attention, but
> > the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set.
> > 
> > This flag is checked in resume when submitting the hub URB, and idea is to react on it:
> > 
> > hub_activate(HUB_RESUME)
> >     usb_submit_urb(hub->urb)
> >      rh_urb_enqueue()
> >        rh_queue_status()
> >           hcd->status_urb = urb;
> >           if (HCD_POLL_PENDING(hcd)
> >              mod_timer(&hcd->rh_timer, jiffies);
> > 
> > It tries to trigger usb_hcd_poll_rh_status() immediately by modifying the timer,
> > but from your logs it seems that it's not fast enough:
> > 
> > [  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
> > [  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: get port status, actual port 0 status  = 0x2a0
> >  ...
> > [  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg 0000 evt 0000
> > [  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0: set port remote wake mask, actual port 0 status  = 0xe0002a0
> > ...
> > [  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
> > [  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend, wakeup 1
> > [  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced with wakeup event
> > [  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
> > [  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1, hcd->flags 0x18d, status_urb NULL
> > 
> > hcd->flags 0x18d means HCD_FLAG_POLL_PENDING  was set, but hub_event() workqueue and hub_suspend()
> > are run before polling the root hub, and I guess the URB gets removed at some point during suspsend.
> > 
> > This then goes on in a loop.
> > 
> > You could try running usb_hcd_poll_rh_status() direcly instead of just kicking the timer and see if
> > it works:
> > 
> > @@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd *hcd, struct urb *urb)
> >                 mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) * (HZ/4));
> >  
> >         /* If a status change has already occurred, report it ASAP */
> > -       else if (HCD_POLL_PENDING(hcd))
> > -               mod_timer(&hcd->rh_timer, jiffies);
> > +       else if (HCD_POLL_PENDING(hcd)) {
> > +               spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > +               usb_hcd_poll_rh_status(hcd);
> > +               return 0;
> > +       }
> >         retval = 0;
> >   done:
> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > 
> > 
> > I didn't check locking or running contexts if it's ok to do this, there's could be a reason
> > why it wasn't done like this in the first place.
> > Worth a shot still.

It's not a good idea.  Deferring the operation to a timer keeps things
decoupled (running in different contexts with different stacks); doing
a direct call could get into a loop of nested calls that would keep
going deeper and deeper until it exhausted the kernel stack.

Perhaps you didn't notice that at the end, hub_activate() calls
kick_hub_wq().  That routine calls
usb_autopm_get_interface_no_resume(), which will prevent the hub from
suspending until hub_event() calls usb_autopm_put_interface().  
Therefore to make things work correctly, hub_activate() has to ensure
that the hub->event_bits and hub->change_bits fields are set correctly,
so that hub_event() will do the right thing when it runs (before the 
hub is allowed to go back into suspend).

Therefore if a port is in a funny state, hub_activate() should detect
this and set one of these bit fields appropriately.  That's one reason
why it contains all those tests for portstatus and portchange.  If a
missing test needs to be added, that's what you should do.

Alan Stern
Jan-Marek Glogowski Jan. 31, 2019, 6:52 p.m. UTC | #7
Am 31. Januar 2019 17:53:13 MEZ schrieb Alan Stern <stern@rowland.harvard.edu>:
>This thread has been hard to follow.  I'll try to answer some of your
>concerns below...
>
>On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote:
>
>> Am 31.01.19 um 15:56 schrieb Mathias Nyman:
>> > On 31.01.2019 12:42, Jan-Marek Glogowski wrote:
>> >> Am 30.01.19 um 17:56 schrieb Jan-Marek Glogowski:
>> >>> Am 30.01.19 um 15:58 schrieb Mathias Nyman:
>> >>>> On 30.01.2019 14:38, Jan-Marek Glogowski wrote:
>> >>>>
>> >>>> The hub->event bits are set when roothub polling timer function
>gets data from
>> >>>> hub_status_data(), writes the data to a status urb, and returns
>the urb.
>
>You have been concentrating on what happens with root hubs 
>(understandable, since that's where your problem occurs).  But in order
>
>to understand the code, you have to realize that it is meant to apply 
>to _all_ hubs, both root and external.
>
>> >>>> Normal operation:
>> >>>> usb_hcd_poll_rh_status()        // roothub polling timer
>function
>> >>>>    hcd->driver->hub_status_data(hcd, buffer)   ->
>xhci_hub_status_data()
>> >>>>      xhci_hub_status_data()      //returns buffer with bits set
>if a port needs attention
>> >>>>    if (buffer && urb)
>> >>>>      memcpy(urb->transfer_buffer, buffer)       // copy buffer
>bits to urb
>> >>>>      usb_hcd_giveback_urb(urb)  -> hub_irq      // calls urb
>complete callback function
>> >>>>        hub_irq()
>> >>>>          hub->event_bits = urb->transfer_buffer //
>> >>>>    mod_timer(usb_hcd_poll_rh_status, future)    // reschedule
>roothub polling timer
>> >>>>
>> >>>> I'm guessing usb_hcd_poll_rh_status() never properly completes a
>URB in your case.
>
>It's not just a question of a timer expiring.  If we were dealing with 
>an external hub, the code would have to wait until an actual USB 
>transfer had completed.  There's no way to avoid that wait.
>
>> >>>> This could be possible if roothub is polled before properly
>resumed, then the URB wouldn't
>> >>>> be available yet, and no hub->event_bits are set.
>> >>>> Because no bits are set the hub starts suspending again after
>resume, probably before roothub
>> >>>> timer polls roothub again.
>> >>>> suspend fails when it manually checks
>hcd->driver->hub_status_data() for activity, and
>> >>>> it start resuming again, and continues this in a loop.
>> >>>>
>> >>>> Either roothub polling is stopped at the suspend attempt, or
>maybe continuously rescheduled
>> >>>> into the future because as a part of suspend attempt/resume
>loop.
>> >>
>> >> So as you expected the status_urb is always NULL.
>> >>
>> >> I tried to understand more of the code by adding some more debug
>message.
>> >>
>> >> I looked at usb_hcd_resume_root_hub, which is called when we see
>the "resume root hub" dmesg.
>> >> This just queues the hcd->wakeup_work, so indeed, if
>usb_hcd_poll_rh_status is later called,
>> >> the hub might not have resumed yet and polling starts. So my
>simple approach was to add a
>> >> function to call flush_work(&hcd->wakeup_work) before
>usb_hcd_poll_rh_status. All variants I
>> >> tested killed the kernel. I also tried to add a udelay to no
>avail.
>> > 
>> > I think the idea is that if usb_hcd_poll_rh_status() finds a port
>that needs attention, but
>> > the is URB missing, then a HCD_FLAG_POLL_PENDING flag is set.
>> > 
>> > This flag is checked in resume when submitting the hub URB, and
>idea is to react on it:
>> > 
>> > hub_activate(HUB_RESUME)
>> >     usb_submit_urb(hub->urb)
>> >      rh_urb_enqueue()
>> >        rh_queue_status()
>> >           hcd->status_urb = urb;
>> >           if (HCD_POLL_PENDING(hcd)
>> >              mod_timer(&hcd->rh_timer, jiffies);
>> > 
>> > It tries to trigger usb_hcd_poll_rh_status() immediately by
>modifying the timer,
>> > but from your logs it seems that it's not fast enough:
>> > 
>> > [  485.301806] usbcore:hub_resume: hub 2-0:1.0: hub_resume
>> > [  485.301819] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0:
>get port status, actual port 0 status  = 0x2a0
>> >  ...
>> > [  485.301936] usbcore:hub_event: hub 2-0:1.0: state 7 ports 6 chg
>0000 evt 0000
>> > [  485.301952] xhci_hcd:xhci_hub_control: xhci_hcd 0000:00:14.0:
>set port remote wake mask, actual port 0 status  = 0xe0002a0
>> > ...
>> > [  485.302034] usbcore:hub_suspend: hub 2-0:1.0: hub_suspend
>> > [  485.302048] usbcore:hcd_bus_suspend: usb usb2: bus auto-suspend,
>wakeup 1
>> > [  485.302067] usbcore:hcd_bus_suspend: usb usb2: suspend raced
>with wakeup event
>> > [  485.302071] usbcore:hcd_bus_resume: usb usb2: usb auto-resume
>> > [  485.309765] usb usb2: usb_hcd_poll_rh_status, length 1,
>hcd->flags 0x18d, status_urb NULL
>> > 
>> > hcd->flags 0x18d means HCD_FLAG_POLL_PENDING  was set, but
>hub_event() workqueue and hub_suspend()
>> > are run before polling the root hub, and I guess the URB gets
>removed at some point during suspsend.
>> > 
>> > This then goes on in a loop.
>> > 
>> > You could try running usb_hcd_poll_rh_status() direcly instead of
>just kicking the timer and see if
>> > it works:
>> > 
>> > @@ -822,8 +826,11 @@ static int rh_queue_status (struct usb_hcd
>*hcd, struct urb *urb)
>> >                 mod_timer(&hcd->rh_timer, (jiffies/(HZ/4) + 1) *
>(HZ/4));
>> >  
>> >         /* If a status change has already occurred, report it ASAP
>*/
>> > -       else if (HCD_POLL_PENDING(hcd))
>> > -               mod_timer(&hcd->rh_timer, jiffies);
>> > +       else if (HCD_POLL_PENDING(hcd)) {
>> > +               spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
>> > +               usb_hcd_poll_rh_status(hcd);
>> > +               return 0;
>> > +       }
>> >         retval = 0;
>> >   done:
>> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
>> > 
>> > 
>> > I didn't check locking or running contexts if it's ok to do this,
>there's could be a reason
>> > why it wasn't done like this in the first place.
>> > Worth a shot still.
>
>It's not a good idea.  Deferring the operation to a timer keeps things
>decoupled (running in different contexts with different stacks); doing
>a direct call could get into a loop of nested calls that would keep
>going deeper and deeper until it exhausted the kernel stack.
>
>Perhaps you didn't notice that at the end, hub_activate() calls
>kick_hub_wq().  That routine calls
>usb_autopm_get_interface_no_resume(), which will prevent the hub trim
>suspending until hub_event() calls usb_autopm_put_interface().  
>Therefore to make things work correctly, hub_activate() has to ensure
>that the hub->event_bits and hub->change_bits fields are set correctly,
>so that hub_event() will do the right thing when it runs (before the 
>hub is allowed to go back into suspend).
>
>Therefore if a port is in a funny state, hub_activate() should detect
>this and set one of these bit fields appropriately.  That's one reason
>why it contains all those tests for portstatus and portchange.  If a
>missing test needs to be added, that's what you should do.

That would be like the v3 of my patch, which started this thread and detected and handled the warm-reset request of the port in hub_activate() on resume. Maybe we can set some of these bits instead, or its already handled by the hub_port_reset() call; have to check.

I also included a patch in some previous mail after Mathias Nyman commented about the NULL status_urb, which added usb_hcd_poll_rh_status directly after the root hubs usb_remote_wakeup call.

A lot of stuff is deferred via workqueue or polled. As I understand it - in my case - resume, suspend and status race and miss each other in some endless loop.

As things are, I just have the new desktop HWs and a single usb-c device to test the ports of them. I have no idea what would happen with an additional non-root hub, but I can probably organize one, if I have to test it.

Jan-Marek
Alan Stern Jan. 31, 2019, 9:11 p.m. UTC | #8
On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote:

> >Perhaps you didn't notice that at the end, hub_activate() calls
> >kick_hub_wq().  That routine calls
> >usb_autopm_get_interface_no_resume(), which will prevent the hub trim
> >suspending until hub_event() calls usb_autopm_put_interface().  
> >Therefore to make things work correctly, hub_activate() has to ensure
> >that the hub->event_bits and hub->change_bits fields are set correctly,
> >so that hub_event() will do the right thing when it runs (before the 
> >hub is allowed to go back into suspend).
> >
> >Therefore if a port is in a funny state, hub_activate() should detect
> >this and set one of these bit fields appropriately.  That's one reason
> >why it contains all those tests for portstatus and portchange.  If a
> >missing test needs to be added, that's what you should do.
> 
> That would be like the v3 of my patch, which started this thread and
> detected and handled the warm-reset request of the port in
> hub_activate() on resume. Maybe we can set some of these bits
> instead, or its already handled by the hub_port_reset() call; have to
> check.

Suppose instead of making hub_activate() do the warm reset during
resume, you merely have it set the port's bit in hub->event_bits.  
Then hub_event() would see that it needed to call port_event() for that
port.  Near the end of port_event() there is code that checks whether a
warm reset is needed; would that existing code be able to solve your
problem?

> I also included a patch in some previous mail after Mathias Nyman
> commented about the NULL status_urb, which added
> usb_hcd_poll_rh_status directly after the root hubs usb_remote_wakeup
> call.
> 
> A lot of stuff is deferred via workqueue or polled. As I understand
> it - in my case - resume, suspend and status race and miss each other
> in some endless loop.

This indicates that something is not using the existing synchronization 
mechanisms properly.  For example, something sees that a status request 
is needed but doesn't store that information in the right way to 
prevent a suspend from happening before the status information has been 
retrieved.

> As things are, I just have the new desktop HWs and a single usb-c
> device to test the ports of them. I have no idea what would happen
> with an additional non-root hub, but I can probably organize one, if
> I have to test it.

I don't know if it would tell us anything, but if you want to get a 
USB-C hub and see how well it works, that would be great.

Alan Stern
Mathias Nyman Feb. 1, 2019, 11:46 a.m. UTC | #9
On 31.01.2019 23:11, Alan Stern wrote:
> On Thu, 31 Jan 2019, Jan-Marek Glogowski wrote:
> 
>>> Perhaps you didn't notice that at the end, hub_activate() calls
>>> kick_hub_wq().  That routine calls
>>> usb_autopm_get_interface_no_resume(), which will prevent the hub trim
>>> suspending until hub_event() calls usb_autopm_put_interface().
>>> Therefore to make things work correctly, hub_activate() has to ensure
>>> that the hub->event_bits and hub->change_bits fields are set correctly,
>>> so that hub_event() will do the right thing when it runs (before the
>>> hub is allowed to go back into suspend).
>>>
>>> Therefore if a port is in a funny state, hub_activate() should detect
>>> this and set one of these bit fields appropriately.  That's one reason
>>> why it contains all those tests for portstatus and portchange.  If a
>>> missing test needs to be added, that's what you should do.
>>
>> That would be like the v3 of my patch, which started this thread and
>> detected and handled the warm-reset request of the port in
>> hub_activate() on resume. Maybe we can set some of these bits
>> instead, or its already handled by the hub_port_reset() call; have to
>> check.
> 
> Suppose instead of making hub_activate() do the warm reset during
> resume, you merely have it set the port's bit in hub->event_bits.
> Then hub_event() would see that it needed to call port_event() for that
> port.  Near the end of port_event() there is code that checks whether a
> warm reset is needed; would that existing code be able to solve your
> problem?
> 

Sounds good.

So if we can't rely on a hub status urb returning before hub_event(), we
should make sure hub->event_bits are set manually in hub_activate() for all
the same cases as hcd->driver->hub_status_data() in hcd_bus_suspend().

Otherwise we can end up in a hub suspend-resume loop.

>> I also included a patch in some previous mail after Mathias Nyman
>> commented about the NULL status_urb, which added
>> usb_hcd_poll_rh_status directly after the root hubs usb_remote_wakeup
>> call.
>>
>> A lot of stuff is deferred via workqueue or polled. As I understand
>> it - in my case - resume, suspend and status race and miss each other
>> in some endless loop.
> 
> This indicates that something is not using the existing synchronization
> mechanisms properly.  For example, something sees that a status request
> is needed but doesn't store that information in the right way to
> prevent a suspend from happening before the status information has been
> retrieved.

This is what I believe is going on:

starting point: hub is suspended, xhci gets a port link status change interrupt,
port link is in invalid state, xhci driver resumes roothub and starts roothub polling

loop start:
usb_hcd_poll_rh_status()	// roothub polling
  see a port needs attention when calling hcd->driver->hub_status_data()
  no hcd->status_urb available as hub is suspended, can't giveback urb
hub_activate()			// hub resume
   manually check port status, nothing found.
   usb_submit_urb(hub->urb, GFP_NOIO)
   kick_hub_wq(hub);

hub_event()  			// started by kick_hub_wq() in hub_activate()
   hub->event_bits and hub->change_bits are zero as hub status urb didn't return.
   put_interface()		// as no activity found on hub.

start suspending hub, removes hub->status_urb somewhere in the process
hcd_bus_suspend()
   if (hcd->driver->hub_status_data()) // true, races with a root-hub event
     hcd_bus_resume()

usb_hcd_poll_rh_status() expires again at this point, goto loop start

To avoid this loop I thought we could force the hub status urb to complete at some point,
but as Alan commented it might be better to make sure the manual port status
checks in hub_activate() covers all the same cases as hcd->driver->hub_status_data().

-Mathias
diff mbox series

Patch

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index bb0830c..33d4821 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -108,6 +108,16 @@  EXPORT_SYMBOL_GPL(ehci_cf_port_reset_rwsem);
 static void hub_release(struct kref *kref);
 static int usb_reset_and_verify_device(struct usb_device *udev);
 static int hub_port_disable(struct usb_hub *hub, int port1, int set_state);
+static bool hub_port_warm_reset_required(struct usb_hub *hub, int port1,
+		u16 portstatus);
+static int hub_port_reset(struct usb_hub *hub, int port1,
+			struct usb_device *udev, unsigned int delay, bool warm);
+
+#define HUB_ROOT_RESET_TIME	60	/* times are in msec */
+#define HUB_SHORT_RESET_TIME	10
+#define HUB_BH_RESET_TIME	50
+#define HUB_LONG_RESET_TIME	200
+#define HUB_RESET_TIMEOUT	800
 
 static inline char *portspeed(struct usb_hub *hub, int portstatus)
 {
@@ -1147,6 +1157,11 @@  static void hub_activate(struct usb_hub *hub, enum hub_activation_type type)
 						USB_SS_PORT_LS_POLLING))
 			need_debounce_delay = true;
 
+		if (type == HUB_RESUME &&
+		    hub_port_warm_reset_required(hub, port1, portstatus))
+			hub_port_reset(hub, port1, udev,
+					HUB_BH_RESET_TIME, true);
+
 		/* Clear status-change flags; we'll debounce later */
 		if (portchange & USB_PORT_STAT_C_CONNECTION) {
 			need_debounce_delay = true;
@@ -2684,12 +2699,6 @@  static unsigned hub_is_wusb(struct usb_hub *hub)
 #define SET_CONFIG_TRIES	(2 * (use_both_schemes + 1))
 #define USE_NEW_SCHEME(i, scheme)	((i) / 2 == (int)scheme)
 
-#define HUB_ROOT_RESET_TIME	60	/* times are in msec */
-#define HUB_SHORT_RESET_TIME	10
-#define HUB_BH_RESET_TIME	50
-#define HUB_LONG_RESET_TIME	200
-#define HUB_RESET_TIMEOUT	800
-
 /*
  * "New scheme" enumeration causes an extra state transition to be
  * exposed to an xhci host and causes USB3 devices to receive control