Message ID | 20241014211122.7cb5b133@foxbook (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix the NEC stop bug workaround | expand |
On Mon, Oct 14, 2024 at 09:11:22PM +0200, Michal Pecio wrote: > NEC controllers have a bug, where stopping an endpoint soon after it > has been restarted doesn't quite work as expected. This forces us to > track whether each Stop Endpoint command is expected to fail or not. > > Reuse this infrastracture to warn about similar bugs on other chips, > if any are found. > > Signed-off-by: Michal Pecio <michal.pecio@gmail.com> > --- > drivers/usb/host/xhci-ring.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c > index c0efb4d34ab9..c326b86d713c 100644 > --- a/drivers/usb/host/xhci-ring.c > +++ b/drivers/usb/host/xhci-ring.c > @@ -1186,8 +1186,11 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id, > * So keep retrying until the command clearly succeeds. > * Not clear what to do if other HCs have similar bugs. > */ > - if (!(xhci->quirks & XHCI_NEC_HOST)) > + if (!(xhci->quirks & XHCI_NEC_HOST)) { > + xhci_warn(xhci, "Unhandled Stop Endpoint failure on slot %d ep_index %d\n", > + slot_id, ep_index); If a user sees this, what are they supposed to do? This is a hardware bug, but with this we are going to get reports of "something broke in the kernel", right? Why not make it just more informative, like: xhci_info(xhci, "hardware can not deal with... or something like that so that people know we know about the bug, and are working around it, but that it's not our issue, but rather the hardware that is at fault? thanks, greg k-h
Hi, > + if (!(xhci->quirks & XHCI_NEC_HOST)) { > > + xhci_warn(xhci, "Unhandled Stop Endpoint failure on slot %d ep_index %d\n", > > + slot_id, ep_index); > > If a user sees this, what are they supposed to do? This is a hardware > bug, but with this we are going to get reports of "something broke in > the kernel", right? Why not make it just more informative, like: > xhci_info(xhci, "hardware can not deal with... > > or something like that so that people know we know about the bug, and > are working around it, but that it's not our issue, but rather the > hardware that is at fault? Yes, the point is that ideally some users would report when they see it. This is not a warning for a bug we know about and work around, it's for hypothetical bugs that aren't known. If I'm adding code to handle a bug of my HC, I might as well use it to sanity check other HCs for free. Ideally they are OK and no one will ever see it. I tested several HCs in January and found no similar issues. Maybe I will test them again. I specifically worded it as something the kernel could do better rather than the kernel just whining about something incomprehensibly. For the latter, we already have the absolute classic "ERROR Transfer event TRB DMA ptr not part of current TD", which would often follow this one, but offer little clue about the root cause. It's kinda too late by then. Regards, Michal
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index c0efb4d34ab9..c326b86d713c 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1186,8 +1186,11 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id, * So keep retrying until the command clearly succeeds. * Not clear what to do if other HCs have similar bugs. */ - if (!(xhci->quirks & XHCI_NEC_HOST)) + if (!(xhci->quirks & XHCI_NEC_HOST)) { + xhci_warn(xhci, "Unhandled Stop Endpoint failure on slot %d ep_index %d\n", + slot_id, ep_index); break; + } fallthrough; case EP_STATE_RUNNING:
NEC controllers have a bug, where stopping an endpoint soon after it has been restarted doesn't quite work as expected. This forces us to track whether each Stop Endpoint command is expected to fail or not. Reuse this infrastracture to warn about similar bugs on other chips, if any are found. Signed-off-by: Michal Pecio <michal.pecio@gmail.com> --- drivers/usb/host/xhci-ring.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)