Message ID | 20240911051716.6572-2-ki.chiang65@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/3] xhci: Don't issue Reset Device command to Etron xHCI host | expand |
Hi, > This happens when the xHCI driver enqueue a control TD (which cross > over the Link TRB between two ring segments, as shown) in the endpoint > zero's transfer ring. Seems the Etron xHCI host can not perform this > TD correctly, causing the USB transfer error occurred, maybe the upper > driver retry that control-IN request can solve problem, but not all > drivers do this. > > | | > ------- > | TRB | Setup Stage > ------- > | TRB | Link > ------- > ------- > | TRB | Data Stage > ------- > | TRB | Status Stage > ------- > | | I wonder about a few things. 1. What are the exact symptoms, besides Ethernet driver errors? Any errors from xhci_hcd? What if dynamic debug is enabled? 2. How did you determine that this is the exact cause? 3. Does it happen every time when a Link follows Setup, or only randomly and it takes lots of control transfers to trigger it? 4. How is it even possible? As far as I see, Linux simply queues three TRBs for a control URB. There are 255 slots in a segemnt, so exactly 85 URBs should fit, and then back to the first slot. Regards, Michal
On 11.9.2024 8.17, Kuangyi Chiang wrote: > Performing a stability stress test on a USB3.0 2.5G ethernet adapter > results in errors like this: > > [ 91.441469] r8152 2-3:1.0 eth3: get_registers -71 > [ 91.458659] r8152 2-3:1.0 eth3: get_registers -71 > [ 91.475911] r8152 2-3:1.0 eth3: get_registers -71 > [ 91.493203] r8152 2-3:1.0 eth3: get_registers -71 > [ 91.510421] r8152 2-3:1.0 eth3: get_registers -71 > > The r8152 driver will periodically issue lots of control-IN requests > to access the status of ethernet adapter hardware registers during > the test. > > This happens when the xHCI driver enqueue a control TD (which cross > over the Link TRB between two ring segments, as shown) in the endpoint > zero's transfer ring. Seems the Etron xHCI host can not perform this > TD correctly, causing the USB transfer error occurred, maybe the upper > driver retry that control-IN request can solve problem, but not all > drivers do this. > > | | > ------- > | TRB | Setup Stage > ------- > | TRB | Link > ------- > ------- > | TRB | Data Stage > ------- > | TRB | Status Stage > ------- > | | > What if the link TRB is between Data and Status stage, does that case work normally? > To work around this, the xHCI driver should enqueue a No Op TRB if > next available TRB is the Link TRB in the ring segment, this can > prevent the Setup and Data Stage TRB to be breaked by the Link TRB. There are some hosts that need the 'Chain' bit set in the Link TRB, does that work in this case? Thanks Mathias
> 4. How is it even possible? As far as I see, Linux simply queues > three TRBs for a control URB. There are 255 slots in a segemnt, > so exactly 85 URBs should fit, and then back to the first slot. Not all control transfers have a Data stage TRB. -Mathias
Hi, Thank you for the review. Michał Pecio <michal.pecio@gmail.com> 於 2024年9月11日 週三 下午3:52寫道: > > Hi, > > > This happens when the xHCI driver enqueue a control TD (which cross > > over the Link TRB between two ring segments, as shown) in the endpoint > > zero's transfer ring. Seems the Etron xHCI host can not perform this > > TD correctly, causing the USB transfer error occurred, maybe the upper > > driver retry that control-IN request can solve problem, but not all > > drivers do this. > > > > | | > > ------- > > | TRB | Setup Stage > > ------- > > | TRB | Link > > ------- > > ------- > > | TRB | Data Stage > > ------- > > | TRB | Status Stage > > ------- > > | | > > I wonder about a few things. > > 1. What are the exact symptoms, besides Ethernet driver errors? > Any errors from xhci_hcd? What if dynamic debug is enabled? The xhci driver receives a transfer event TRB (completion code is "USB Transaction Error") when the issue is triggered. > > 2. How did you determine that this is the exact cause? The issue is triggered every time when a Link TRB follows a Setup Stage TRB. > > 3. Does it happen every time when a Link follows Setup, or only > randomly and it takes lots of control transfers to trigger it? Yes, it happens every time. > > 4. How is it even possible? As far as I see, Linux simply queues > three TRBs for a control URB. There are 255 slots in a segemnt, > so exactly 85 URBs should fit, and then back to the first slot. The xhci driver also queues no data control transfers. > > Regards, > Michal Thanks, Kuangyi Chiang
Hi, Thank you for the review. Mathias Nyman <mathias.nyman@linux.intel.com> 於 2024年9月11日 週三 下午11:05寫道: > > On 11.9.2024 8.17, Kuangyi Chiang wrote: > > Performing a stability stress test on a USB3.0 2.5G ethernet adapter > > results in errors like this: > > > > [ 91.441469] r8152 2-3:1.0 eth3: get_registers -71 > > [ 91.458659] r8152 2-3:1.0 eth3: get_registers -71 > > [ 91.475911] r8152 2-3:1.0 eth3: get_registers -71 > > [ 91.493203] r8152 2-3:1.0 eth3: get_registers -71 > > [ 91.510421] r8152 2-3:1.0 eth3: get_registers -71 > > > > The r8152 driver will periodically issue lots of control-IN requests > > to access the status of ethernet adapter hardware registers during > > the test. > > > > This happens when the xHCI driver enqueue a control TD (which cross > > over the Link TRB between two ring segments, as shown) in the endpoint > > zero's transfer ring. Seems the Etron xHCI host can not perform this > > TD correctly, causing the USB transfer error occurred, maybe the upper > > driver retry that control-IN request can solve problem, but not all > > drivers do this. > > > > | | > > ------- > > | TRB | Setup Stage > > ------- > > | TRB | Link > > ------- > > ------- > > | TRB | Data Stage > > ------- > > | TRB | Status Stage > > ------- > > | | > > > > What if the link TRB is between Data and Status stage, does that > case work normally? I am not sure, I don't encounter this case, maybe OK. > > > To work around this, the xHCI driver should enqueue a No Op TRB if > > next available TRB is the Link TRB in the ring segment, this can > > prevent the Setup and Data Stage TRB to be breaked by the Link TRB. > > There are some hosts that need the 'Chain' bit set in the Link TRB, > does that work in this case? No, it doesn't work. It seems to be a hardware issue. > > Thanks > Mathias > Thanks, Kuangyi Chiang
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 2fa7f32c2bf9..dda873f3fee7 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -398,12 +398,14 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; xhci->quirks |= XHCI_NO_RESET_DEVICE; + xhci->quirks |= XHCI_NO_BREAK_CTRL_TD; } if (pdev->vendor == PCI_VENDOR_ID_ETRON && pdev->device == PCI_DEVICE_ID_EJ188) { xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; xhci->quirks |= XHCI_NO_RESET_DEVICE; + xhci->quirks |= XHCI_NO_BREAK_CTRL_TD; } if (pdev->vendor == PCI_VENDOR_ID_RENESAS && diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4ea2c3e072a9..1c387d4dc152 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -3727,6 +3727,19 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags, if (!urb->setup_packet) return -EINVAL; + if (xhci->quirks & XHCI_NO_BREAK_CTRL_TD) { + /* + * If next available TRB is the Link TRB in the ring segment then + * enqueue a No Op TRB, this can prevent the Setup and Data Stage + * TRB to be breaked by the Link TRB. + */ + if (trb_is_link(ep_ring->enqueue + 1)) { + field = TRB_TYPE(TRB_TR_NOOP) | ep_ring->cycle_state; + queue_trb(xhci, ep_ring, false, 0, 0, + TRB_INTR_TARGET(0), field); + } + } + /* 1 TRB for setup, 1 for status */ num_trbs = 2; /* diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 1272d725270a..aedbe8fee8be 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1629,6 +1629,7 @@ struct xhci_hcd { #define XHCI_ZHAOXIN_HOST BIT_ULL(46) #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_NO_RESET_DEVICE BIT_ULL(48) +#define XHCI_NO_BREAK_CTRL_TD BIT_ULL(49) unsigned int num_active_eps; unsigned int limit_active_eps;
Performing a stability stress test on a USB3.0 2.5G ethernet adapter results in errors like this: [ 91.441469] r8152 2-3:1.0 eth3: get_registers -71 [ 91.458659] r8152 2-3:1.0 eth3: get_registers -71 [ 91.475911] r8152 2-3:1.0 eth3: get_registers -71 [ 91.493203] r8152 2-3:1.0 eth3: get_registers -71 [ 91.510421] r8152 2-3:1.0 eth3: get_registers -71 The r8152 driver will periodically issue lots of control-IN requests to access the status of ethernet adapter hardware registers during the test. This happens when the xHCI driver enqueue a control TD (which cross over the Link TRB between two ring segments, as shown) in the endpoint zero's transfer ring. Seems the Etron xHCI host can not perform this TD correctly, causing the USB transfer error occurred, maybe the upper driver retry that control-IN request can solve problem, but not all drivers do this. | | ------- | TRB | Setup Stage ------- | TRB | Link ------- ------- | TRB | Data Stage ------- | TRB | Status Stage ------- | | To work around this, the xHCI driver should enqueue a No Op TRB if next available TRB is the Link TRB in the ring segment, this can prevent the Setup and Data Stage TRB to be breaked by the Link TRB. Add a new quirk flag XHCI_NO_BREAK_CTRL_TD to invoke the workaround in xhci_queue_ctrl_tx(). Both EJ168 and EJ188 have the same problem, applying this patch then the problem is gone. Fixes: d0e96f5a71a0 ("USB: xhci: Control transfer support.") Cc: <stable@vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65@gmail.com> --- drivers/usb/host/xhci-pci.c | 2 ++ drivers/usb/host/xhci-ring.c | 13 +++++++++++++ drivers/usb/host/xhci.h | 1 + 3 files changed, 16 insertions(+)