diff mbox series

[5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun

Message ID 20250210084220.3e5414e9@foxbook (mailing list archive)
State New
Headers show
Series xHCI: Isochronous error handling fixes and improvements | expand

Commit Message

Michal Pecio Feb. 10, 2025, 7:42 a.m. UTC
If skipping is deferred to events other than Missed Service Error itsef,
it means we are running on an xHCI 1.0 host and don't know how many TDs
were missed until we reach some ordinary transfer completion event.

And in case of ring xrun, we can't know where the xrun happened either.

If we skip all pending TDs, we may prematurely give back TDs added after
the xrun had occurred, risking data loss or buffer UAF by the xHC.

If we skip none, a driver may become confused and stop working when all
its URBs are missed and appear to be "in flight" forever.

Skip exactly one TD on each xrun event - the first one that was missed,
as we can now be sure that the HC has finished processing it. Provided
that one more TD is queued before any subsequent doorbell ring, it will
become safe to skip another TD by the time we get an xrun again.

Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---
 drivers/usb/host/xhci-ring.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Mathias Nyman Feb. 11, 2025, 3:41 p.m. UTC | #1
On 10.2.2025 9.42, Michal Pecio wrote:
> If skipping is deferred to events other than Missed Service Error itsef,
> it means we are running on an xHCI 1.0 host and don't know how many TDs
> were missed until we reach some ordinary transfer completion event.
> 
> And in case of ring xrun, we can't know where the xrun happened either.
> 
> If we skip all pending TDs, we may prematurely give back TDs added after
> the xrun had occurred, risking data loss or buffer UAF by the xHC.
> 
> If we skip none, a driver may become confused and stop working when all
> its URBs are missed and appear to be "in flight" forever.
> 
> Skip exactly one TD on each xrun event - the first one that was missed,
> as we can now be sure that the HC has finished processing it. Provided
> that one more TD is queued before any subsequent doorbell ring, it will
> become safe to skip another TD by the time we get an xrun again.
> 
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
>   drivers/usb/host/xhci-ring.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
> index 878abf5b745d..049206a1db76 100644
> --- a/drivers/usb/host/xhci-ring.c
> +++ b/drivers/usb/host/xhci-ring.c
> @@ -2875,6 +2875,18 @@ static int handle_tx_event(struct xhci_hcd *xhci,
>   
>   			if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
>   				skip_isoc_td(xhci, td, ep, status);
> +
> +				if (ring_xrun_event) {
> +					/*
> +					 * If we are here, we are on xHCI 1.0 host with no idea how
> +					 * many TDs were missed and where the xrun occurred. Don't
> +					 * skip more TDs, they may have been queued after the xrun.
> +					 */
> +					xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> +							slot_id, ep_index);
> +					break;

This would be the same as return 0; right?

Whole series looks good, I'll add it

-Mathias
Michal Pecio Feb. 12, 2025, 7:30 a.m. UTC | #2
On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
> > +				if (ring_xrun_event) {
> > +					/*
> > +					 * If we are here, we are on xHCI 1.0 host with no idea how
> > +					 * many TDs were missed and where the xrun occurred. Don't
> > +					 * skip more TDs, they may have been queued after the xrun.
> > +					 */
> > +					xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> > +							slot_id, ep_index);
> > +					break;  
> 
> This would be the same as return 0; right?

Currently, yes. I know it looks silly, but I thought it would be more
future proof than hardcoding 'return 0' into the loop. The point it to
simply stop iteration, what happens next is none of the loop's business.

I hope gcc is clever enough to do the right thing here.

Regards,
Michal
Michal Pecio Feb. 21, 2025, 1:17 a.m. UTC | #3
On Tue, 11 Feb 2025 17:41:39 +0200, Mathias Nyman wrote:
> On 10.2.2025 9.42, Michal Pecio wrote:
> > +				if (ring_xrun_event) {
> > +					/*
> > +					 * If we are here, we are on xHCI 1.0 host with no idea how
> > +					 * many TDs were missed and where the xrun occurred. Don't
> > +					 * skip more TDs, they may have been queued after the xrun.
> > +					 */
> > +					xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
> > +							slot_id, ep_index);
> > +					break;  
> 
> This would be the same as return 0; right?
> 
> Whole series looks good, I'll add it

I hope you haven't sent it out yet because I found two minor issues.


Firstly,
[PATCH 3/5] usb: xhci: Fix isochronous Ring Underrun/Overrun event handling

increases the number of xrun events that we handle but doesn't suppress
the "Event TRB for slot %u ep %u with no TDs queued\n" warning, so the
warning started to show up sometimes for no good reason. The fix is to
add ring_xrun_event to the list of exception for this warning.


Secondly,
[PATCH 5/5] usb: xhci: Skip only one TD on Ring Underrun/Overrun

can be improved to clear the skip flag if skipped TD was the only one.
This eliminates any confusion and risk of skipping bugs in the future.
The change is a matter of moving that code to a different branch.

I also changed 'break' to 'return 0' because it gets hard to follow at
this level of indentation.


I'll send a v2 of those two patches. Sorry for any inconvenience.

Michal
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 878abf5b745d..049206a1db76 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2875,6 +2875,18 @@  static int handle_tx_event(struct xhci_hcd *xhci,
 
 			if (!ep_seg && usb_endpoint_xfer_isoc(&td->urb->ep->desc)) {
 				skip_isoc_td(xhci, td, ep, status);
+
+				if (ring_xrun_event) {
+					/*
+					 * If we are here, we are on xHCI 1.0 host with no idea how
+					 * many TDs were missed and where the xrun occurred. Don't
+					 * skip more TDs, they may have been queued after the xrun.
+					 */
+					xhci_dbg(xhci, "Skipped one TD for slot %u ep %u",
+							slot_id, ep_index);
+					break;
+				}
+
 				if (!list_empty(&ep_ring->td_list))
 					continue;