diff mbox series

[RFT,v2] xhci: process isoc TD properly when there was an error mid TD.

Message ID 20240118135652.2629101-1-mathias.nyman@linux.intel.com (mailing list archive)
State Superseded
Headers show
Series [RFT,v2] xhci: process isoc TD properly when there was an error mid TD. | expand

Commit Message

Mathias Nyman Jan. 18, 2024, 1:56 p.m. UTC
The last TRB of a isoc TD might not trigger an event if there was
an error event for TRB mid TD. This is seen on a NEC Corporation
uPD720200 USB 3.0 Host

After an error mid a multi-TRB TD the xHC should according to xhci 4.9.1
generate events for passed TRBs with IOC flag set if it proceeds to the
next TD. This event is either a copy of the original error, or a
"success" transfer event.

If that event is missing then the driver and xHC host get out of sync
as driver is still expecting a transfer event for that first TD, while
xHC host is already sending events for the next TD in the list.
This leads to
"Transfer event TRB DMA ptr not part of current TD" messages.

As a solution we tag the isoc TDs that get error events mid TD.
If an event doesn't match the first TD, then check if the tag is
set, and event points to the next TD.
In that case give back the fist TD and process the next TD normally

Make sure TD status and tranferred length stay valid in all cases.

Reported-by: Michał Pecio <michal.pecio@gmail.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
  v1 -> v2:
  - Check for empty TD list, suggested by Michał
  - Don't overwrite error status, suggested by Michał
  - Add a debug print for missed TD completion event, suggested by Michał
  - Reword commit message and code comments
  - Set correct TD transfer length in isoc error case

drivers/usb/host/xhci-ring.c | 81 +++++++++++++++++++++++++++++-------
 drivers/usb/host/xhci.h      |  1 +
 2 files changed, 66 insertions(+), 16 deletions(-)

Comments

Michał Pecio Jan. 18, 2024, 10:16 p.m. UTC | #1
My usual set of tests passes:
- no spam on disconnection from NEC
- no stream lockup on random errors on NEC
- no spam on disconnection from VIA
- finish_td called with right frame->status on VIA
  (checked by means of extra printks)

> +		/* Error mid TD, don't give TD back yet */
> +		td->error_mid_td = true;
> +		td->urb_length_set = true;
> +
> +		frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) +
> +			ep_trb_len - remaining;
Not a problem with this patch, but I noticed that every single use of
this function ends up adding ep_trb_len, maybe it could be inclusive.

> +		td->urb->actual_length += frame->actual_length;
In your first email you mentioned hosts responding to every single TRB,
perhaps with the same error code repeated each time?

I imagine it could be problematic here if such hosts really exist and
if there are enough TRBs to execute this line twice. A check for the
error_mid_td bit previously set could help, if this is a real risk.


Thanks,
Michal
Mathias Nyman Jan. 19, 2024, 10:49 a.m. UTC | #2
On 19.1.2024 0.16, Michał Pecio wrote:
> My usual set of tests passes:
> - no spam on disconnection from NEC
> - no stream lockup on random errors on NEC
> - no spam on disconnection from VIA
> - finish_td called with right frame->status on VIA
>    (checked by means of extra printks)
> 
>> +		/* Error mid TD, don't give TD back yet */
>> +		td->error_mid_td = true;
>> +		td->urb_length_set = true;
>> +
>> +		frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) +
>> +			ep_trb_len - remaining;
> Not a problem with this patch, but I noticed that every single use of
> this function ends up adding ep_trb_len, maybe it could be inclusive.
> 
>> +		td->urb->actual_length += frame->actual_length;
> In your first email you mentioned hosts responding to every single TRB,
> perhaps with the same error code repeated each time?
> 
> I imagine it could be problematic here if such hosts really exist and
> if there are enough TRBs to execute this line twice. A check for the
> error_mid_td bit previously set could help, if this is a real risk.

Good point, refactored that code a bit and it now both looks nicer and should
solve this case.

One more patchround

Thanks
Mathias
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 33806ae966f9..9f0f694358cc 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2376,6 +2376,9 @@  static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	/* handle completion code */
 	switch (trb_comp_code) {
 	case COMP_SUCCESS:
+		/* Don't overwrite status if TD had an error, see xHCI 4.9.1 */
+		if (td->error_mid_td)
+			break;
 		if (remaining) {
 			frame->status = short_framestatus;
 			if (xhci->quirks & XHCI_TRUST_TX_LENGTH)
@@ -2401,9 +2404,20 @@  static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		break;
 	case COMP_USB_TRANSACTION_ERROR:
 		frame->status = -EPROTO;
-		if (ep_trb != td->last_trb)
-			return 0;
-		break;
+		sum_trbs_for_length = true;
+
+		if (ep_trb == td->last_trb)
+			break;
+
+		/* Error mid TD, don't give TD back yet */
+		td->error_mid_td = true;
+		td->urb_length_set = true;
+
+		frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) +
+			ep_trb_len - remaining;
+		td->urb->actual_length += frame->actual_length;
+		return 0;
+
 	case COMP_STOPPED:
 		sum_trbs_for_length = true;
 		break;
@@ -2422,6 +2436,9 @@  static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		break;
 	}
 
+	if (td->urb_length_set)
+		goto finish_td;
+
 	if (sum_trbs_for_length)
 		frame->actual_length = sum_trb_lengths(xhci, ep->ring, ep_trb) +
 			ep_trb_len - remaining;
@@ -2430,6 +2447,7 @@  static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 
 	td->urb->actual_length += frame->actual_length;
 
+finish_td:
 	return finish_td(xhci, ep, ep_ring, td, trb_comp_code);
 }
 
@@ -2808,17 +2826,51 @@  static int handle_tx_event(struct xhci_hcd *xhci,
 		}
 
 		if (!ep_seg) {
-			if (!ep->skip ||
-			    !usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
-				/* Some host controllers give a spurious
-				 * successful event after a short transfer.
-				 * Ignore it.
-				 */
-				if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
-						ep_ring->last_td_was_short) {
-					ep_ring->last_td_was_short = false;
-					goto cleanup;
+
+			if (ep->skip && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
+				skip_isoc_td(xhci, td, ep, status);
+				goto cleanup;
+			}
+
+			/*
+			 * Some hosts give a spurious success event after a short
+			 * transfer. Ignore it.
+			 */
+			if ((xhci->quirks & XHCI_SPURIOUS_SUCCESS) &&
+			    ep_ring->last_td_was_short) {
+				ep_ring->last_td_was_short = false;
+				goto cleanup;
+			}
+
+			/*
+			 * xhci 4.10.2 states isoc endpoints should continue
+			 * processing the next TD if there was an error mid TD.
+			 * So host like NEC don't generate an event for the last
+			 * isoc TRB even if the IOC flag is set.
+			 * xhci 4.9.1 states that if there are errors in mult-TRB
+			 * TDs xHC should generate an error for that TRB, and if xHC
+			 * proceeds to the next TD it should genete an event for
+			 * any TRB with IOC flag on the way. Other host follow this.
+			 * So this event might be for the next TD.
+			 */
+			if (td->error_mid_td &&
+			    !list_is_last(&td->td_list, &ep_ring->td_list)) {
+				struct xhci_td *td_next = list_next_entry(td, td_list);
+
+				ep_seg = trb_in_td(xhci, td_next->start_seg, td_next->first_trb,
+						   td_next->last_trb, ep_trb_dma, false);
+				if (ep_seg) {
+					/* give back previous TD, start handling new */
+					xhci_dbg(xhci, "Missing TD completion event after mid TD error\n");
+					ep_ring->dequeue = td->last_trb;
+					ep_ring->deq_seg = td->last_trb_seg;
+					inc_deq(xhci, ep_ring);
+					xhci_td_cleanup(xhci, td, ep_ring, td->status);
+					td = td_next;
 				}
+			}
+
+			if (!ep_seg) {
 				/* HC is busted, give up! */
 				xhci_err(xhci,
 					"ERROR Transfer event TRB DMA ptr not "
@@ -2830,9 +2882,6 @@  static int handle_tx_event(struct xhci_hcd *xhci,
 					  ep_trb_dma, true);
 				return -ESHUTDOWN;
 			}
-
-			skip_isoc_td(xhci, td, ep, status);
-			goto cleanup;
 		}
 		if (trb_comp_code == COMP_SHORT_PACKET)
 			ep_ring->last_td_was_short = true;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index a5c72a634e6a..6f82d404883f 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1549,6 +1549,7 @@  struct xhci_td {
 	struct xhci_segment	*bounce_seg;
 	/* actual_length of the URB has already been set */
 	bool			urb_length_set;
+	bool			error_mid_td;
 	unsigned int		num_trbs;
 };