Message ID | 20250226080202.7eb5e142@foxbook (mailing list archive) |
---|---|
Headers | show |
Series | xHCI: Isochronous error handling fixes and improvements | expand |
On 26.2.2025 9.02, Michal Pecio wrote: > These patches reduce latency of error reporting in some cases related > to 'error mid TD' and Missed Service events and sometimes fix a failure > to give back such TDs altogether until they are cancelled. > > Also included are fixes for potential packet loss or memory corruption > due to obscure races. Whether it causes problems IRL is not known and > the worst case would be hard to reproduce, but exactly for this reason > if the worst case actually happens, it could be hard to debug too. > > The first three should be safe. The fourth should also be safe, but it > relies on HC functionality Linux never relied on before, so I placed it > towards the end in case it would need some tweaks. I tested it on all > hardware I have and it worked just fine. > > The last one is perhaps the most controversial, though it should be > OK with typical "complete -> resubmit" drivers. It's the only one here > which increases latency in some severe error cases. The intent is to > avoid potentially giving back URBs not yet executed by hardware. > > New in v3: > - dropped the cleanup patch > - added Don't skip on Stopped - Length Invalid > > New in v3: > - fixed spurious empty list warning on xrun > - clear skip flag if one skipped event was the last > > Michal Pecio (5): > usb: xhci: Don't skip on Stopped - Length Invalid > usb: xhci: Complete 'error mid TD' transfers when handling Missed > Service > usb: xhci: Fix isochronous Ring Underrun/Overrun event handling > usb: xhci: Expedite skipping missed isoch TDs on modern HCs > usb: xhci: Skip only one TD on Ring Underrun/Overrun > Updated my for-usb-next branch to this v3 version Thanks Mathias
On Wed, 26 Feb 2025 14:41:44 +0200, Mathias Nyman wrote: > Updated my for-usb-next branch to this v3 version A few remarks regarding "Add helper to find trb from its dma address": xhci_dma_to_trb(): This function could use xhci_for_each_ring_seg. The use of in_range() perhaps deserves a comment, because its correctness is not as obvious as it might seem. Long story short, my own version: /* * Look up a TRB on the ring by its DMA address, return NULL if not found. * Start from deq_seg to optimize for event handling use. * * Note: false positive is possible if dma < TRB_SEGMENT_SIZE *and* * seg->dma > (dma_addr_t) 0 - TRB_SEGMENT_SIZE, but that shouldn't * happen if seg->dma is an allocation of size TRB_SEGMENT_SIZE. */ static union xhci_trb *xhci_dma_to_trb(struct xhci_ring *ring, dma_addr_t dma) { struct xhci_segment *seg; xhci_for_each_ring_seg(ring->deq_seg, seg) if (in_range(dma, seg->dma, TRB_SEGMENT_SIZE)) return seg->trbs + (dma - seg->dma) / sizeof(seg->trbs[0]); return NULL; } >+ struct xhci_td *matched_td; This variable is only used as bool so it could be declared as such. Other places still use 'td' and assume that it equals 'matched_td'. And that's OK because there is no need for iterating further after the matching TD is found. >+ /* find the transfer trb this events points to */ >+ if (ep_trb_dma) >+ ep_trb = xhci_dma_to_trb(ep_ring, ep_trb_dma); This may deserve a dedicated warning. It's a pathology. Either the event is bogus due to internal corruption in the HC, or it's executing TRBs from a wrong ring due to damaged/ignored Link TRB or bad Set Deq. Or we completely screwed up and are looking at a wrong ep_ring here. >- if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb_dma) >+ if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb) > return 0; Good idea. I think I would go further and refuse to handle anything when (ep_trb_dma && !ep_trb). Nothing is going to match, nothing good will come from trying as far as I see. But that's a behavior change, so maybe material for a separate patch. >+ matched_td = NULL; > > /* Is this a TRB in the currently executing TD? */ >- ep_seg = trb_in_td(xhci, td, ep_trb_dma, false); >+ if (trb_in_td(xhci, td, ep_trb_dma, false)) >+ matched_td = td; If 'matched_td' is declared as bool, this will become simply: matched_td = trb_in_td(xhci, td, ep_trb_dma, false); Regards, Michal
On 27.2.2025 0.05, Michał Pecio wrote: > On Wed, 26 Feb 2025 14:41:44 +0200, Mathias Nyman wrote: >> Updated my for-usb-next branch to this v3 version > > > A few remarks regarding "Add helper to find trb from its dma address": > > xhci_dma_to_trb(): > This function could use xhci_for_each_ring_seg. Good point > The use of in_range() perhaps deserves a comment, because its > correctness is not as obvious as it might seem. > > Long story short, my own version: > > /* > * Look up a TRB on the ring by its DMA address, return NULL if not found. > * Start from deq_seg to optimize for event handling use. > * > * Note: false positive is possible if dma < TRB_SEGMENT_SIZE *and* > * seg->dma > (dma_addr_t) 0 - TRB_SEGMENT_SIZE, but that shouldn't > * happen if seg->dma is an allocation of size TRB_SEGMENT_SIZE. > */ True, but as you said this shouldn't happen as we allocate TRB_SEGMENT_SIZE bytes starting at seg->dma, so seg->dma should be at least TRB_SEGMENT_SIZE bytes from max u64 (or u32) We can also use "if (dma >= seg->dma && (dma - seg->dma) < TRB_SEGMENT_SIZE)" instead of in_range(). It's a bit uglier, but we can skip additional notes. > static union xhci_trb *xhci_dma_to_trb(struct xhci_ring *ring, dma_addr_t dma) > { > struct xhci_segment *seg; > > xhci_for_each_ring_seg(ring->deq_seg, seg) > if (in_range(dma, seg->dma, TRB_SEGMENT_SIZE)) > return seg->trbs + (dma - seg->dma) / sizeof(seg->trbs[0]); > > return NULL; > } > >> + struct xhci_td *matched_td; > > This variable is only used as bool so it could be declared as such. > Other places still use 'td' and assume that it equals 'matched_td'. > And that's OK because there is no need for iterating further after > the matching TD is found. True, it could be just a bool > >> + /* find the transfer trb this events points to */ >> + if (ep_trb_dma) >> + ep_trb = xhci_dma_to_trb(ep_ring, ep_trb_dma); > > This may deserve a dedicated warning. It's a pathology. Either the > event is bogus due to internal corruption in the HC, or it's executing > TRBs from a wrong ring due to damaged/ignored Link TRB or bad Set Deq. > Or we completely screwed up and are looking at a wrong ep_ring here. > >> - if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb_dma) >> + if (trb_comp_code == COMP_MISSED_SERVICE_ERROR && !ep_trb) >> return 0; > > Good idea. I think I would go further and refuse to handle anything > when (ep_trb_dma && !ep_trb). Nothing is going to match, nothing good > will come from trying as far as I see. > > But that's a behavior change, so maybe material for a separate patch. Idea of this patch is to slowly migrate handle_tx_event() towards the vision in my feature_transfer_event_rework branch https://web.git.kernel.org/pub/scm/linux/kernel/git/mnyman/xhci.git/log/?h=feature_transfer_event_rework Niklas is working towards a similar goal, and he just informed me that this patch conflicts a bit with his plan to get there, so I might drop this. Thanks for looking at it. -Mathias