Message ID | 20240123101049.4f76f43f@foxbook (mailing list archive) |
---|---|
State | Accepted |
Commit | 7c4650ded49e5b88929ecbbb631efb8b0838e811 |
Headers | show |
Series | xhci: handle isoc Babble and Buffer Overrun events properly | expand |
On 23.1.2024 11.10, Michal Pecio wrote: > xHCI 4.9 explicitly forbids assuming that the xHC has released its > ownership of a multi-TRB TD when it reports an error on one of the > early TRBs. Yet the driver makes such assumption and releases the TD, > allowing the remaining TRBs to be freed or overwritten by new TDs. > > The xHC should also report completion of the final TRB due to its IOC > flag being set by us, regardless of prior errors. This event cannot > be recognized if the TD has already been freed earlier, resulting in > "Transfer event TRB DMA ptr not part of current TD" error message. > > Fix this by reusing the logic for processing isoc Transaction Errors. > This also handles hosts which fail to report the final completion. > > Fix transfer length reporting on Babble errors. They may be caused by > device malfunction, no guarantee that the buffer has been filled. > > Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Thanks, adding to queue. > --- > > Question: > > Will this become a game of whack-a-mole as new cases are reported? > > Would it make sense to apply error_mid_td right away to more codes > that plausibly lead to an abort of the current TD? > > Or do it after the initial patches prove themselves in real world? I'd send tested patches that solve real world issues first to the usb-linus (6.8 kernel), with stable tags. Then sort out if we need to add error_mid_td to other completion codes, and send patches for those to usb-next (6.9 kernel) -Mathias
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 22a4aa65e4c9..9673354d70d5 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2393,9 +2393,13 @@ static int process_isoc_td(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, case COMP_BANDWIDTH_OVERRUN_ERROR: frame->status = -ECOMM; break; - case COMP_ISOCH_BUFFER_OVERRUN: case COMP_BABBLE_DETECTED_ERROR: + sum_trbs_for_length = true; + fallthrough; + case COMP_ISOCH_BUFFER_OVERRUN: frame->status = -EOVERFLOW; + if (ep_trb != td->last_trb) + td->error_mid_td = true; break; case COMP_INCOMPATIBLE_DEVICE_ERROR: case COMP_STALL_ERROR:
xHCI 4.9 explicitly forbids assuming that the xHC has released its ownership of a multi-TRB TD when it reports an error on one of the early TRBs. Yet the driver makes such assumption and releases the TD, allowing the remaining TRBs to be freed or overwritten by new TDs. The xHC should also report completion of the final TRB due to its IOC flag being set by us, regardless of prior errors. This event cannot be recognized if the TD has already been freed earlier, resulting in "Transfer event TRB DMA ptr not part of current TD" error message. Fix this by reusing the logic for processing isoc Transaction Errors. This also handles hosts which fail to report the final completion. Fix transfer length reporting on Babble errors. They may be caused by device malfunction, no guarantee that the buffer has been filled. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> --- Question: Will this become a game of whack-a-mole as new cases are reported? Would it make sense to apply error_mid_td right away to more codes that plausibly lead to an abort of the current TD? Or do it after the initial patches prove themselves in real world? The impact of freeing owned TRBs is unknown. No one appears to have ever complained, myself included. The error messages are merely an annoyance - next event is a match and all is back to normal. drivers/usb/host/xhci-ring.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)