diff mbox series

usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs

Message ID 20250211133614.5d64301f@foxbook (mailing list archive)
State New
Headers show
Series usb: xhci: Handle quirky SuperSpeed isoc error reporting by Etron HCs | expand

Commit Message

Michał Pecio Feb. 11, 2025, 12:36 p.m. UTC
xHCI 4.9.1 requires HCs to obey the IOC flag we set on the last TRB even
after an error has been reported on an earlier TRB. This typically means
that an error mid TD is followed by a success event for the last TRB.

On SuperSpeed (and only SS) isochronous endpoints Etron hosts were found
to emit a success event also after an error on the last TRB of a TD.

Reuse the machinery for handling errors mid TD to handle these otherwise
unexpected events. Avoid printing "TRB not part of current TD" errors,
ensure proper tracking of HC's internal dequeue pointer and distinguish
this known quirk from other bogus events caused by ordinary bugs.

This patch was found to eliminate all related warnings and errors while
running for 30 minutes with a UVC camera on a flaky cable which produces
transaction errors about every second. An altsetting was chosen which
causes some TDs to be multi-TRB, dynamic debug was used to confirm that
errors both mid TD and on the last TRB are handled as expected:

[ 6028.439776] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 6028.439784] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
[ 6028.440268] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
[ 6028.440270] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
[ 6029.123683] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 6029.123694] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=0
[ 6029.123697] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
[ 6029.123700] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error

Handling of Stopped events is unaffected: finish_td() is called but it
does nothing and the TD waits until it's unlinked:

[ 7081.705544] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
[ 7081.705546] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
[ 7081.705630] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
[ 7081.705633] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
[ 7081.705678] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
[ 7081.705680] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
[ 7081.705759] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
[ 7081.705799] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2

Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
Closes: https://lore.kernel.org/linux-usb/20250205053750.28251-1-ki.chiang65@gmail.com/T/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---



Hi Mathias,

This is the best I was able to do. It does add a few lines, but I don't
think it's too scary and IMO the switch looks even better this way. It
accurately predicts those events while not breaking anything else that
I can see or think of, save for the risk of firmware bugfix adding one
ESIT of latency on errors.

I tried to also test your Etron patch but it has whitespace damage all
over the place and would be hard to apply.

Regards,
Michal


 drivers/usb/host/xhci-ring.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

Comments

Kuangyi Chiang Feb. 12, 2025, 5:59 a.m. UTC | #1
Hi,

Michal Pecio <michal.pecio@gmail.com> 於 2025年2月11日 週二 下午8:36寫道:
>
> xHCI 4.9.1 requires HCs to obey the IOC flag we set on the last TRB even
> after an error has been reported on an earlier TRB. This typically means
> that an error mid TD is followed by a success event for the last TRB.
>
> On SuperSpeed (and only SS) isochronous endpoints Etron hosts were found
> to emit a success event also after an error on the last TRB of a TD.
>
> Reuse the machinery for handling errors mid TD to handle these otherwise
> unexpected events. Avoid printing "TRB not part of current TD" errors,
> ensure proper tracking of HC's internal dequeue pointer and distinguish
> this known quirk from other bogus events caused by ordinary bugs.
>
> This patch was found to eliminate all related warnings and errors while
> running for 30 minutes with a UVC camera on a flaky cable which produces
> transaction errors about every second. An altsetting was chosen which
> causes some TDs to be multi-TRB, dynamic debug was used to confirm that
> errors both mid TD and on the last TRB are handled as expected:
>
> [ 6028.439776] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 6028.439784] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
> [ 6028.440268] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
> [ 6028.440270] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
> [ 6029.123683] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 6029.123694] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=0
> [ 6029.123697] xhci_hcd 0000:06:00.0: Successful completion on short TX for slot 1 ep 2 with last td short 0
> [ 6029.123700] xhci_hcd 0000:06:00.0: Got event 1 after mid TD error
>
> Handling of Stopped events is unaffected: finish_td() is called but it
> does nothing and the TD waits until it's unlinked:
>
> [ 7081.705544] xhci_hcd 0000:06:00.0: Transfer error for slot 1 ep 2 on endpoint
> [ 7081.705546] xhci_hcd 0000:06:00.0: Error 4 mid isoc TD, wait for final completion event, is_last_trb=1
> [ 7081.705630] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
> [ 7081.705633] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
> [ 7081.705678] xhci_hcd 0000:06:00.0: Stopped on Transfer TRB for slot 1 ep 2
> [ 7081.705680] xhci_hcd 0000:06:00.0: Got event 26 after mid TD error
> [ 7081.705759] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
> [ 7081.705799] xhci_hcd 0000:06:00.0: Stopped on No-op or Link TRB for slot 1 ep 2
>
> Reported-by: Kuangyi Chiang <ki.chiang65@gmail.com>
> Closes: https://lore.kernel.org/linux-usb/20250205053750.28251-1-ki.chiang65@gmail.com/T/
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
>
>
>
> Hi Mathias,
>
> This is the best I was able to do. It does add a few lines, but I don't
> think it's too scary and IMO the switch looks even better this way. It
> accurately predicts those events while not breaking anything else that
> I can see or think of, save for the risk of firmware bugfix adding one
> ESIT of latency on errors.
>
> I tried to also test your Etron patch but it has whitespace damage all
> over the place and would be hard to apply.
>
> Regards,
> Michal
>
>
>  drivers/usb/host/xhci-ring.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 965bffce301e..7ff5075e5890 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2437,6 +2437,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>         bool sum_trbs_for_length = false;
>         u32 remaining, requested, ep_trb_len;
>         int short_framestatus;
> +       bool error_event = false, etron_quirk = false;
>
>         trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
>         urb_priv = td->urb->hcpriv;
> @@ -2473,8 +2474,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>                 fallthrough;
>         case COMP_ISOCH_BUFFER_OVERRUN:
>                 frame->status = -EOVERFLOW;
> -               if (ep_trb != td->end_trb)
> -                       td->error_mid_td = true;
> +               error_event = true;
>                 break;
>         case COMP_INCOMPATIBLE_DEVICE_ERROR:
>         case COMP_STALL_ERROR:
> @@ -2483,8 +2483,7 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>         case COMP_USB_TRANSACTION_ERROR:
>                 frame->status = -EPROTO;
>                 sum_trbs_for_length = true;
> -               if (ep_trb != td->end_trb)
> -                       td->error_mid_td = true;
> +               error_event = true;
>                 break;
>         case COMP_STOPPED:
>                 sum_trbs_for_length = true;
> @@ -2518,8 +2517,17 @@ static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
>         td->urb->actual_length += frame->actual_length;
>
>  finish_td:
> +       /* An error event mid TD will be followed by more events, xHCI 4.9.1 */
> +       td->error_mid_td |= error_event && (ep_trb != td->end_trb);
> +
> +       /* Etron treats *all* SuperSpeed isoc errors like errors mid TD */
> +       if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
> +               td->error_mid_td |= error_event;
> +               etron_quirk |= error_event;

This would be the same as etron_quirk = error_event; right?

> +       }
> +
>         /* Don't give back TD yet if we encountered an error mid TD */
> -       if (td->error_mid_td && ep_trb != td->end_trb) {
> +       if (td->error_mid_td && (ep_trb != td->end_trb || etron_quirk)) {
>                 xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n");
>                 td->urb_length_set = true;
>                 return;
> --
> 2.48.1

I tested this with Etron EJ168 and EJ188 under Linux-6.13.1. It works.

Thanks,
Kuangyi Chiang
Michał Pecio Feb. 12, 2025, 8:12 a.m. UTC | #2
On Wed, 12 Feb 2025 13:59:49 +0800, Kuangyi Chiang wrote:
> > +       if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
> > +               td->error_mid_td |= error_event;
> > +               etron_quirk |= error_event;  
> 
> This would be the same as etron_quirk = error_event; right?

Yeah, same thing I guess.

> I tested this with Etron EJ168 and EJ188 under Linux-6.13.1. It works.

Well, I found one case where it doesn't work optimally. There is a
separate patch to skip "Missed Service Error" TDs immediately when the
error is reported and also to treat MSE as another 'error mid TD', so
with this Etron patch we would end up expecting spurious success after
an MSE on the last TRB.

Well, AFAIS, no such event is generated by Etron in this case so we are
back to waiting till next event and then giving back the missed TD.


Maybe I will seriously look into decoupling giveback and dequeue ptr
tracking, not only for those spurious Etron events but everywhere.

Mathias is right that HW has no sensible reason to touch DMA buffers
after an error, I will look if the spec is very explicit about it.
If so, we could give back TDs after the first event and merely keep
enough information to recognize and silently ignore further events.

Michal
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 965bffce301e..7ff5075e5890 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2437,6 +2437,7 @@  static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	bool sum_trbs_for_length = false;
 	u32 remaining, requested, ep_trb_len;
 	int short_framestatus;
+	bool error_event = false, etron_quirk = false;
 
 	trb_comp_code = GET_COMP_CODE(le32_to_cpu(event->transfer_len));
 	urb_priv = td->urb->hcpriv;
@@ -2473,8 +2474,7 @@  static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 		fallthrough;
 	case COMP_ISOCH_BUFFER_OVERRUN:
 		frame->status = -EOVERFLOW;
-		if (ep_trb != td->end_trb)
-			td->error_mid_td = true;
+		error_event = true;
 		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR:
@@ -2483,8 +2483,7 @@  static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	case COMP_USB_TRANSACTION_ERROR:
 		frame->status = -EPROTO;
 		sum_trbs_for_length = true;
-		if (ep_trb != td->end_trb)
-			td->error_mid_td = true;
+		error_event = true;
 		break;
 	case COMP_STOPPED:
 		sum_trbs_for_length = true;
@@ -2518,8 +2517,17 @@  static void process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	td->urb->actual_length += frame->actual_length;
 
 finish_td:
+	/* An error event mid TD will be followed by more events, xHCI 4.9.1 */
+	td->error_mid_td |= error_event && (ep_trb != td->end_trb);
+
+	/* Etron treats *all* SuperSpeed isoc errors like errors mid TD */
+	if (xhci->quirks & XHCI_ETRON_HOST && td->urb->dev->speed == USB_SPEED_SUPER) {
+		td->error_mid_td |= error_event;
+		etron_quirk |= error_event;
+	}
+
 	/* Don't give back TD yet if we encountered an error mid TD */
-	if (td->error_mid_td && ep_trb != td->end_trb) {
+	if (td->error_mid_td && (ep_trb != td->end_trb || etron_quirk)) {
 		xhci_dbg(xhci, "Error mid isoc TD, wait for final completion event\n");
 		td->urb_length_set = true;
 		return;