diff mbox series

xhci: retry Stop Endpoint on buggy NEC controllers

Message ID 20240126113744.7671c52d@foxbook (mailing list archive)
State Accepted
Commit fd9d55d190c0e5fefd3a9165ea361809427885a1
Headers show
Series xhci: retry Stop Endpoint on buggy NEC controllers | expand

Commit Message

MichaƂ Pecio Jan. 26, 2024, 10:37 a.m. UTC
Two NEC uPD720200 adapters have been observed to randomly misbehave:
a Stop Endpoint command fails with Context Error, the Output Context
indicates Stopped state, and the endpoint keeps running. Very often,
Set TR Dequeue Pointer is seen to fail next with Context Error too,
in addition to problems from unexpectedly completed cancelled work.

The pathology is common on fast running isoc endpoints like uvcvideo,
but has also been reproduced on a full-speed bulk endpoint of pl2303.
It seems all EPs are affected, with risk proportional to their load.

Reproduction involves receiving any kind of stream and closing it to
make the device driver cancel URBs already queued in advance.

Deal with it by retrying the command like in the Running state.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---

This should be my last patch for NEC eccentricities since everything is
working smoothly now. I may or may not find something interesting on the
VL805. It still crashes sometimes for no obvious reason, needing reboot.

I thought it would be prudent to trigger this on uPD720200 only, hell
knows what bugs other controllers have. Note that the NEC quirk applies
to this specific chip only - its successors have vendor ID of Renesas.

I feel a little dirty retrying something with no obvious stop condition;
is there anything that prevents this from trying forever if things go
really wrong? Same question for the "running" case. I figure a counter
could be easily added for both, if necessary.

I typically see 1 to 3 retries before the command succeeds.

[  +0,000008] usb 9-2: Selecting alternate setting 9 (20480 B/frame bandwidth)
[  +0,005639] usb 9-2: Allocated 5 URB buffers of 32x20480 bytes each
[  +0,292400] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000051] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000109] xhci_hcd 0000:02:00.0: It worked!
[  +0,000087] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000047] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000117] xhci_hcd 0000:02:00.0: It worked!
[  +0,000040] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000040] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000045] xhci_hcd 0000:02:00.0: Retrying STOP on buggy NEC
[  +0,000123] xhci_hcd 0000:02:00.0: It worked!



 drivers/usb/host/xhci-ring.c | 9 +++++++++
 1 file changed, 9 insertions(+)
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 9673354d70d5..7edd655cb6b4 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1147,6 +1147,15 @@  static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id,
 				break;
 			ep->ep_state &= ~EP_STOP_CMD_PENDING;
 			return;
+		case EP_STATE_STOPPED:
+			/*
+			 * NEC uPD720200 sometimes sets this state and fails with
+			 * Context Error while continuing to process TRBs.
+			 * Be conservative and trust EP_CTX_STATE on other chips.
+			 */
+			if (!(xhci->quirks & XHCI_NEC_HOST))
+				break;
+			fallthrough;
 		case EP_STATE_RUNNING:
 			/* Race, HW handled stop ep cmd before ep was running */
 			xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n");