diff mbox series

xhci: clear port_remote_wakeup after resume failure

Message ID 20190524145231.6605-1-nsaenzjulienne@suse.de (mailing list archive)
State Superseded
Headers show
Series xhci: clear port_remote_wakeup after resume failure | expand

Commit Message

Nicolas Saenz Julienne May 24, 2019, 2:52 p.m. UTC
This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
Ethernet device interfaces with the laptop through one of it's USB3
ports. While idle, the Ethernet device and HCD are suspended by runtime
PM, being the only device connected on the bus. Then, both are resumed on
behalf of the Ethernet device, which has remote wake-up capabilities.

The Ethernet device was observed to randomly disconnect from the USB
port shortly after submitting it's remote wake-up request. Probably a
weird timing issue yet to be investigated. This causes runtime PM to
busyloop causing some tangible CPU load. The reason is the port gets
stuck in the middle of a remote wake-up operation, waiting for the
device to switch to U0. This never happens, leaving "port_remote_wakeup"
enabled, and automatically triggering a failure on any further suspend
operation.

This patch clears "port_remote_wakeup" upon detecting a device with a
wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
above mentioned situation doesn't trigger a PM busyloop.

Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de>
---
 drivers/usb/host/xhci-hub.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Mathias Nyman May 27, 2019, 11:16 a.m. UTC | #1
On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> Ethernet device interfaces with the laptop through one of it's USB3
> ports. While idle, the Ethernet device and HCD are suspended by runtime
> PM, being the only device connected on the bus. Then, both are resumed on
> behalf of the Ethernet device, which has remote wake-up capabilities.
> 
> The Ethernet device was observed to randomly disconnect from the USB
> port shortly after submitting it's remote wake-up request. Probably a
> weird timing issue yet to be investigated. This causes runtime PM to
> busyloop causing some tangible CPU load. The reason is the port gets
> stuck in the middle of a remote wake-up operation, waiting for the
> device to switch to U0. This never happens, leaving "port_remote_wakeup"
> enabled, and automatically triggering a failure on any further suspend
> operation.
> 
> This patch clears "port_remote_wakeup" upon detecting a device with a
> wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> above mentioned situation doesn't trigger a PM busyloop.
> 

There was a similar case where the USB3 link had transitioned to a
lower power U1 or U2 state after resume, before driver read the state,
leaving port_remote_wakeup flag uncleared.

This was fixed in 5.1 kernel by commit:

6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable

Can you check if you have it?
It should be in recent stable releases as well.

-Mathias
Nicolas Saenz Julienne May 27, 2019, 11:28 a.m. UTC | #2
Hi Matthias,
thanks for the review.

On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
> On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> > This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> > Ethernet device interfaces with the laptop through one of it's USB3
> > ports. While idle, the Ethernet device and HCD are suspended by runtime
> > PM, being the only device connected on the bus. Then, both are resumed on
> > behalf of the Ethernet device, which has remote wake-up capabilities.
> > 
> > The Ethernet device was observed to randomly disconnect from the USB
> > port shortly after submitting it's remote wake-up request. Probably a
> > weird timing issue yet to be investigated. This causes runtime PM to
> > busyloop causing some tangible CPU load. The reason is the port gets
> > stuck in the middle of a remote wake-up operation, waiting for the
> > device to switch to U0. This never happens, leaving "port_remote_wakeup"
> > enabled, and automatically triggering a failure on any further suspend
> > operation.
> > 
> > This patch clears "port_remote_wakeup" upon detecting a device with a
> > wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> > above mentioned situation doesn't trigger a PM busyloop.
> > 
> 
> There was a similar case where the USB3 link had transitioned to a
> lower power U1 or U2 state after resume, before driver read the state,
> leaving port_remote_wakeup flag uncleared.
> 
> This was fixed in 5.1 kernel by commit:
> 
> 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
> 
> Can you check if you have it?
> It should be in recent stable releases as well.

I was aware of that patch, unfortunately it doesn't address the same issue. In
my case I never get a second port status event (so no PLC == 1 or any state
change seen in PLS). The device simply disconnects from the bus.

I did test both the issue and fix on top of last week's master branch.

Regards,
Nicolas
Mathias Nyman June 4, 2019, 1:53 p.m. UTC | #3
On 27.5.2019 14.28, Nicolas Saenz Julienne wrote:
> Hi Matthias,
> thanks for the review.
> 
> On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
>> On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
>>> This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
>>> Ethernet device interfaces with the laptop through one of it's USB3
>>> ports. While idle, the Ethernet device and HCD are suspended by runtime
>>> PM, being the only device connected on the bus. Then, both are resumed on
>>> behalf of the Ethernet device, which has remote wake-up capabilities.
>>>
>>> The Ethernet device was observed to randomly disconnect from the USB
>>> port shortly after submitting it's remote wake-up request. Probably a
>>> weird timing issue yet to be investigated. This causes runtime PM to
>>> busyloop causing some tangible CPU load. The reason is the port gets
>>> stuck in the middle of a remote wake-up operation, waiting for the
>>> device to switch to U0. This never happens, leaving "port_remote_wakeup"
>>> enabled, and automatically triggering a failure on any further suspend
>>> operation.
>>>
>>> This patch clears "port_remote_wakeup" upon detecting a device with a
>>> wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
>>> above mentioned situation doesn't trigger a PM busyloop.
>>>
>>
>> There was a similar case where the USB3 link had transitioned to a
>> lower power U1 or U2 state after resume, before driver read the state,
>> leaving port_remote_wakeup flag uncleared.
>>
>> This was fixed in 5.1 kernel by commit:
>>
>> 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
>>
>> Can you check if you have it?
>> It should be in recent stable releases as well.
> 
> I was aware of that patch, unfortunately it doesn't address the same issue. In
> my case I never get a second port status event (so no PLC == 1 or any state
> change seen in PLS). The device simply disconnects from the bus.
> 

I see, ok, then we need to clear the flag in the hub thread.

But to me it looks like this patch could cause a small race risk in the successful
device initiated resume cases.

If the hub thread, i.e. the get_port_status() function, notices the U0 state before
the interrupt handler, i.e. handle_port_status() function, then port_remote_wakeup
flag is cleared in the hub thread and the wakeup notification is never called from
handle_port_status().

Would it be enough to just check for (port_remote_wakeup flag && !PORT_CONNECT) in the hub thread?
USB3 PORT_CONNECT bit is lost in most error cases.

-Mathias
Nicolas Saenz Julienne June 8, 2019, 1:33 p.m. UTC | #4
On Tue, 2019-06-04 at 16:53 +0300, Mathias Nyman wrote:
> On 27.5.2019 14.28, Nicolas Saenz Julienne wrote:
> > Hi Matthias,
> > thanks for the review.
> > 
> > On Mon, 2019-05-27 at 14:16 +0300, Mathias Nyman wrote:
> > > On 24.5.2019 17.52, Nicolas Saenz Julienne wrote:
> > > > This was seen on a Dell Precision 5520 using it's WD15 dock. The dock's
> > > > Ethernet device interfaces with the laptop through one of it's USB3
> > > > ports. While idle, the Ethernet device and HCD are suspended by runtime
> > > > PM, being the only device connected on the bus. Then, both are resumed
> > > > on
> > > > behalf of the Ethernet device, which has remote wake-up capabilities.
> > > > 
> > > > The Ethernet device was observed to randomly disconnect from the USB
> > > > port shortly after submitting it's remote wake-up request. Probably a
> > > > weird timing issue yet to be investigated. This causes runtime PM to
> > > > busyloop causing some tangible CPU load. The reason is the port gets
> > > > stuck in the middle of a remote wake-up operation, waiting for the
> > > > device to switch to U0. This never happens, leaving "port_remote_wakeup"
> > > > enabled, and automatically triggering a failure on any further suspend
> > > > operation.
> > > > 
> > > > This patch clears "port_remote_wakeup" upon detecting a device with a
> > > > wrong resuming port state (see Table 4-9 in 4.15.2.3). Making sure the
> > > > above mentioned situation doesn't trigger a PM busyloop.
> > > > 
> > > 
> > > There was a similar case where the USB3 link had transitioned to a
> > > lower power U1 or U2 state after resume, before driver read the state,
> > > leaving port_remote_wakeup flag uncleared.
> > > 
> > > This was fixed in 5.1 kernel by commit:
> > > 
> > > 6cbcf59 xhci: Fix port resume done detection for SS ports with LPM enable
> > > 
> > > Can you check if you have it?
> > > It should be in recent stable releases as well.
> > 
> > I was aware of that patch, unfortunately it doesn't address the same issue.
> > In
> > my case I never get a second port status event (so no PLC == 1 or any state
> > change seen in PLS). The device simply disconnects from the bus.
> > 
> 
> I see, ok, then we need to clear the flag in the hub thread.
> 
> But to me it looks like this patch could cause a small race risk in the
> successful
> device initiated resume cases.
> 
> If the hub thread, i.e. the get_port_status() function, notices the U0 state
> before
> the interrupt handler, i.e. handle_port_status() function, then
> port_remote_wakeup
> flag is cleared in the hub thread and the wakeup notification is never called
> from
> handle_port_status().
> 
> Would it be enough to just check for (port_remote_wakeup flag &&
> !PORT_CONNECT) in the hub thread?
> USB3 PORT_CONNECT bit is lost in most error cases.

I get your concerns, there is a race indeed. On top of that, checking
PORT_CONNECT works fine for me.

So if I undertood your suggestion right, would this be fine? 

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 3abe70ff1b1e..253957dc62de 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1057,6 +1057,9 @@ static u32 xhci_get_port_status(struct usb_hcd *hcd,
                bus_state->resume_done[wIndex] = 0;
                clear_bit(wIndex, &bus_state->resuming_ports);
                usb_hcd_end_port_resume(&hcd->self, wIndex);
+       } else if (bus_state->port_remote_wakeup & (1 << port->hcd_portnum) &&
+                  !(raw_port_status & PORT_CONNECT)) {
+               bus_state->port_remote_wakeup &= ~(1 << port->hcd_portnum);
        }
 
        if (bus_state->port_c_suspend & (1 << wIndex))

Regards,
Nicolas
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index 3abe70ff1b1e..53f5ee50ef8c 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1047,8 +1047,8 @@  static u32 xhci_get_port_status(struct usb_hcd *hcd,
 		xhci_get_usb2_port_status(port, &status, raw_port_status,
 					  flags);
 	/*
-	 * Clear stale usb2 resume signalling variables in case port changed
-	 * state during resume signalling. For example on error
+	 * Clear stale resume signalling variables in case port changed
+	 * state during resume signalling. For example on error.
 	 */
 	if ((bus_state->resume_done[wIndex] ||
 	     test_bit(wIndex, &bus_state->resuming_ports)) &&
@@ -1057,6 +1057,12 @@  static u32 xhci_get_port_status(struct usb_hcd *hcd,
 		bus_state->resume_done[wIndex] = 0;
 		clear_bit(wIndex, &bus_state->resuming_ports);
 		usb_hcd_end_port_resume(&hcd->self, wIndex);
+	} else if (bus_state->port_remote_wakeup & (1 << port->hcd_portnum) &&
+		   ((raw_port_status & PORT_PLS_MASK) != XDEV_RESUME ||
+		   !(raw_port_status & PORT_CONNECT) ||
+		   !(raw_port_status & PORT_PE) ||
+		   raw_port_status & PORT_OC)) {
+		bus_state->port_remote_wakeup &= ~(1 << port->hcd_portnum);
 	}
 
 	if (bus_state->port_c_suspend & (1 << wIndex))