diff mbox series

xhci: handle isoc Babble and Buffer Overrun events properly

Message ID 20240123101049.4f76f43f@foxbook (mailing list archive)
State Accepted
Commit 7c4650ded49e5b88929ecbbb631efb8b0838e811
Headers show
Series xhci: handle isoc Babble and Buffer Overrun events properly | expand

Commit Message

MichaƂ Pecio Jan. 23, 2024, 9:10 a.m. UTC
xHCI 4.9 explicitly forbids assuming that the xHC has released its
ownership of a multi-TRB TD when it reports an error on one of the
early TRBs. Yet the driver makes such assumption and releases the TD,
allowing the remaining TRBs to be freed or overwritten by new TDs.

The xHC should also report completion of the final TRB due to its IOC
flag being set by us, regardless of prior errors. This event cannot
be recognized if the TD has already been freed earlier, resulting in
"Transfer event TRB DMA ptr not part of current TD" error message.

Fix this by reusing the logic for processing isoc Transaction Errors.
This also handles hosts which fail to report the final completion.

Fix transfer length reporting on Babble errors. They may be caused by
device malfunction, no guarantee that the buffer has been filled.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---

Question:

Will this become a game of whack-a-mole as new cases are reported?

Would it make sense to apply error_mid_td right away to more codes
that plausibly lead to an abort of the current TD?

Or do it after the initial patches prove themselves in real world?


The impact of freeing owned TRBs is unknown. No one appears to have
ever complained, myself included. The error messages are merely an
annoyance - next event is a match and all is back to normal.


 drivers/usb/host/xhci-ring.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Mathias Nyman Jan. 23, 2024, 3:32 p.m. UTC | #1
On 23.1.2024 11.10, Michal Pecio wrote:
> xHCI 4.9 explicitly forbids assuming that the xHC has released its
> ownership of a multi-TRB TD when it reports an error on one of the
> early TRBs. Yet the driver makes such assumption and releases the TD,
> allowing the remaining TRBs to be freed or overwritten by new TDs.
> 
> The xHC should also report completion of the final TRB due to its IOC
> flag being set by us, regardless of prior errors. This event cannot
> be recognized if the TD has already been freed earlier, resulting in
> "Transfer event TRB DMA ptr not part of current TD" error message.
> 
> Fix this by reusing the logic for processing isoc Transaction Errors.
> This also handles hosts which fail to report the final completion.
> 
> Fix transfer length reporting on Babble errors. They may be caused by
> device malfunction, no guarantee that the buffer has been filled.
> 
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>

Thanks, adding to queue.

> ---
> 
> Question:
> 
> Will this become a game of whack-a-mole as new cases are reported?
> 
> Would it make sense to apply error_mid_td right away to more codes
> that plausibly lead to an abort of the current TD?
> 
> Or do it after the initial patches prove themselves in real world?

I'd send tested patches that solve real world issues first to the usb-linus
(6.8 kernel), with stable tags.

Then sort out if we need to add error_mid_td to other completion codes,
and send patches for those to usb-next (6.9 kernel)

-Mathias
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 22a4aa65e4c9..9673354d70d5 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2393,9 +2393,13 @@  static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep,
 	case COMP_BANDWIDTH_OVERRUN_ERROR:
 		frame->status = -ECOMM;
 		break;
-	case COMP_ISOCH_BUFFER_OVERRUN:
 	case COMP_BABBLE_DETECTED_ERROR:
+		sum_trbs_for_length = true;
+		fallthrough;
+	case COMP_ISOCH_BUFFER_OVERRUN:
 		frame->status = -EOVERFLOW;
+		if (ep_trb != td->last_trb)
+			td->error_mid_td = true;
 		break;
 	case COMP_INCOMPATIBLE_DEVICE_ERROR:
 	case COMP_STALL_ERROR: