diff mbox series

[RFC,RFT] usb: hcd: Add a usb_device argument to hc_driver.endpoint_reset()

Message ID 20250408121817.6ae8defd@foxbook (mailing list archive)
State Superseded
Headers show
Series [RFC,RFT] usb: hcd: Add a usb_device argument to hc_driver.endpoint_reset() | expand

Commit Message

Michal Pecio April 8, 2025, 10:18 a.m. UTC
xHCI needs usb_device in this callback so it employed some hacks that
proved unreliable in the long term and made the code a little tricky.

Make USB core supply it directly and simplify xhci_endpoint_reset().
Use xhci_check_args() to prevent resetting emulated endpoints of root
hubs and to deduplicate argument validation and improve debuggability.

Update ehci_endpoint_reset(), which is the only other such callback,
to accept (and ignore) the new argument.

This fixes the root cause of a 6.15-rc1 regression reported by Paul,
which I was able to reproduce locally. It also solves the general
problem of xhci_endpoint_reset() becoming a no-op after device reset
or changing configuration or altsetting. Although nobody complained
because halted endpoints are reset automatically by xhci_hcd, it was
a bug - sometimes class drivers want to reset not halted endpoints.

Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Closes: https://lore.kernel.org/linux-usb/c279bd85-3069-4841-b1be-20507ac9f2d7@molgen.mpg.de/
Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
---

Is such change acceptable to interested parties?

It solves the problem completely for me, because as Alan said,
core calls endpoint_reset() after installing a new config or alt
to notify HCDs that ignore reset_device(), and also those which
implement it incompletely, I guess ;)

Unlike clearing EP_STALLED on reset_device() or drop_endpoint(),
this also fixes cases when another STALL happens after device
reset and the device is not reset again. For example, I see that
when I insert a card after the original problem happens.

At this point I can insert or remove the card, plug or unplug
the reader and reload ums-realtek in any order, it all works.

Paul, could you check if this patch works on your hardware too?

 drivers/usb/core/hcd.c      |  2 +-
 drivers/usb/host/ehci-hcd.c |  3 ++-
 drivers/usb/host/xhci.c     | 27 ++++++++-------------------
 include/linux/usb/hcd.h     |  2 +-
 4 files changed, 12 insertions(+), 22 deletions(-)

Comments

Mathias Nyman April 8, 2025, 1:55 p.m. UTC | #1
On 8.4.2025 13.18, Michał Pecio wrote:
> xHCI needs usb_device in this callback so it employed some hacks that
> proved unreliable in the long term and made the code a little tricky.
> 
> Make USB core supply it directly and simplify xhci_endpoint_reset().
> Use xhci_check_args() to prevent resetting emulated endpoints of root
> hubs and to deduplicate argument validation and improve debuggability.
> 
> Update ehci_endpoint_reset(), which is the only other such callback,
> to accept (and ignore) the new argument.
> 
> This fixes the root cause of a 6.15-rc1 regression reported by Paul,
> which I was able to reproduce locally. It also solves the general
> problem of xhci_endpoint_reset() becoming a no-op after device reset
> or changing configuration or altsetting. Although nobody complained
> because halted endpoints are reset automatically by xhci_hcd, it was
> a bug - sometimes class drivers want to reset not halted endpoints.
> 
> Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
> Closes: https://lore.kernel.org/linux-usb/c279bd85-3069-4841-b1be-20507ac9f2d7@molgen.mpg.de/
> Signed-off-by: Michal Pecio <michal.pecio@gmail.com>
> ---
> 
> Is such change acceptable to interested parties?

Looks like an improvement, and will help clear the EP_STALLED flag
eventually in device reset case.

There are some issues still unsolved due to how xhci endpoints end up being handled
in usb core usb_reset_and_verify_device().

usb_reset_and_verify_device()
   hub_port_init()
     hcd->driver->reset_device(hcd, udev);          /*1 xhci frees ep rings, loses td_list heads */
   usb_hcd_alloc_bandwidth(new_config, NULL, NULL)  /*2 xhci drop+add ep, allocated new ep rings */
   usb_control_msg(udev, usb_sndctrlpipe(...,USB_REQ_SET_CONFIGURATION,...)
   for (interfaces) {
     if (AlternateSetting == 0) {
       usb_disable_interface()  /*3 flush urbs, ->xhci_urb_dequeue() */
       usb_enable_interface()   /*4 clear EP_STALLED flag */
     } else {
       usb_set_interface()
     }

1. driver->reset_device will free all xhci endpoint rings, and lose td_list head, but
    keep cancelled_td_list and ep->ep_state flags. xHC issues reset device command
    setting all internal ep states in xci to "disabled".

2. usb hcd_alloc_bandwith will drop+add xhci endpoints for this configuration,
    allocate new endpoint rings, and inits new td_list head.
    Old cancelled_td_list and ep_state flags are still set, not matching ring.

3. usb_disable_interface() will flush all pending URBs calling xhci_urb_dequeue().
    xhci_urb_dequeue() makes decision based on stale ep_state flags.
    May start to cancel/giveback urbs in old cancelled_td_list for tds that existed
    on old freed ring. will also set host_ep->hcpriv to null

4. usb_enable_interface() calls xhci_endpoint_reset() that finally clears
    the EP_STALLED flag (udev now found thanks to this patch)

Disabling endpoints, like calling usb_disable_interface() in step 3 should be
done before calling  usb_hcd_alloc_bandwith().
This was fixed in other places such as usb_set_interface() and usb_reset_config()

We might need to clean up ep_state flags and give back cancelled URBs in
xhci_discover_or_reset_device() after the reset device xhci command completion.

Thanks
Mathias
Michal Pecio April 9, 2025, 10:18 a.m. UTC | #2
On Tue, 8 Apr 2025 16:55:24 +0300, Mathias Nyman wrote:
> 1. driver->reset_device will free all xhci endpoint rings, and lose
> td_list head, but keep cancelled_td_list and ep->ep_state flags. xHC
> issues reset device command setting all internal ep states in xci to
> "disabled".
> 
> 2. usb hcd_alloc_bandwith will drop+add xhci endpoints for this
> configuration, allocate new endpoint rings, and inits new td_list
> head. Old cancelled_td_list and ep_state flags are still set, not
> matching ring.
> 
> 3. usb_disable_interface() will flush all pending URBs calling
> xhci_urb_dequeue(). xhci_urb_dequeue() makes decision based on stale
> ep_state flags. May start to cancel/giveback urbs in old
> cancelled_td_list for tds that existed on old freed ring. will also
> set host_ep->hcpriv to null
> 
> 4. usb_enable_interface() calls xhci_endpoint_reset() that finally
> clears the EP_STALLED flag (udev now found thanks to this patch)
> 
> Disabling endpoints, like calling usb_disable_interface() in step 3
> should be done before calling  usb_hcd_alloc_bandwith().
> This was fixed in other places such as usb_set_interface() and
> usb_reset_config()

I haven't thought about it, but you are right. This means that my
commit message is wrong to suggest that the problem occurs after
altsetting changes, it is apparently unique to device reset case.

> We might need to clean up ep_state flags and give back cancelled URBs
> in xhci_discover_or_reset_device() after the reset device xhci
> command completion.

I guess it wouldn't be strictly necessary if core flushed all endpoints
before resetting. I frankly always assumed that it does so, because it
also makes sense for other HCDs which don't even define reset_device().

There seems to be nothing stopping them from talking to a device while
it is undergoing a reset and re-configuration, seems a little risky
given that HW isn't exactly free of its own bugs.

Speaking of xhci, I wonder if this could also be responsible for some
xHC crashes during enumeration. I recall that those Etron quirk patches
mentioned events for a disabled endpoint and "HC died" in some cases.

Regards,
Michal
Alan Stern April 9, 2025, 2:13 p.m. UTC | #3
On Wed, Apr 09, 2025 at 12:18:19PM +0200, Michał Pecio wrote:
> On Tue, 8 Apr 2025 16:55:24 +0300, Mathias Nyman wrote:
> > 1. driver->reset_device will free all xhci endpoint rings, and lose
> > td_list head, but keep cancelled_td_list and ep->ep_state flags. xHC
> > issues reset device command setting all internal ep states in xci to
> > "disabled".
> > 
> > 2. usb hcd_alloc_bandwith will drop+add xhci endpoints for this
> > configuration, allocate new endpoint rings, and inits new td_list
> > head. Old cancelled_td_list and ep_state flags are still set, not
> > matching ring.
> > 
> > 3. usb_disable_interface() will flush all pending URBs calling
> > xhci_urb_dequeue(). xhci_urb_dequeue() makes decision based on stale
> > ep_state flags. May start to cancel/giveback urbs in old
> > cancelled_td_list for tds that existed on old freed ring. will also
> > set host_ep->hcpriv to null
> > 
> > 4. usb_enable_interface() calls xhci_endpoint_reset() that finally
> > clears the EP_STALLED flag (udev now found thanks to this patch)
> > 
> > Disabling endpoints, like calling usb_disable_interface() in step 3
> > should be done before calling  usb_hcd_alloc_bandwith().
> > This was fixed in other places such as usb_set_interface() and
> > usb_reset_config()
> 
> I haven't thought about it, but you are right. This means that my
> commit message is wrong to suggest that the problem occurs after
> altsetting changes, it is apparently unique to device reset case.
> 
> > We might need to clean up ep_state flags and give back cancelled URBs
> > in xhci_discover_or_reset_device() after the reset device xhci
> > command completion.
> 
> I guess it wouldn't be strictly necessary if core flushed all endpoints
> before resetting. I frankly always assumed that it does so, because it
> also makes sense for other HCDs which don't even define reset_device().
> 
> There seems to be nothing stopping them from talking to a device while
> it is undergoing a reset and re-configuration, seems a little risky
> given that HW isn't exactly free of its own bugs.

The core does not explicitly flush endpoints before resetting a device.  
However, it does notify the class drivers' pre_reset callback, which is 
supposed to unlink all the URBs used by that driver.  If a driver 
doesn't have a pre_reset callback, the core unbinds it from the device 
(which will unlink all its URBs).  _If_ everything is working properly, 
there shouldn't be any outstanding URBs when the reset takes place.

Either way, though, the core doesn't invoke the HCD's endpoint_reset or 
endpoint_disable callback before the reset.  If you think the core needs 
to do more, or needs to issue the callbacks in a different order, let me 
know.

Alan Stern

> Speaking of xhci, I wonder if this could also be responsible for some
> xHC crashes during enumeration. I recall that those Etron quirk patches
> mentioned events for a disabled endpoint and "HC died" in some cases.
> 
> Regards,
> Michal
Michal Pecio April 15, 2025, 8:38 a.m. UTC | #4
On Wed, 9 Apr 2025 10:13:50 -0400, Alan Stern wrote:
> The core does not explicitly flush endpoints before resetting a
> device. However, it does notify the class drivers' pre_reset
> callback, which is supposed to unlink all the URBs used by that
> driver.  If a driver doesn't have a pre_reset callback, the core
> unbinds it from the device (which will unlink all its URBs).  _If_
> everything is working properly, there shouldn't be any outstanding
> URBs when the reset takes place.

Thank you for clarification. This doesn't look too bad and I currently
have no concrete cases of the mechanism failing to work.
 
> Either way, though, the core doesn't invoke the HCD's endpoint_reset
> or endpoint_disable callback before the reset.  If you think the core
> needs to do more, or needs to issue the callbacks in a different
> order, let me know.

The problem is a matter of mismatched expectations: the core treats
endpoint_disable() as temporary, because "classic" HCDs free their
ep->hcpriv and recreate it quietly on the next URBs submission. And
their endpoint_reset() during this time simply does nothing.

But xhci considers it more permanent, like the last thing before
drop_endpoint(). It too clears ep->hcpriv, but here hcpriv is not
the endpoint state, it's usb_device pointer saved by add_endpoint()
and required for operation. The driver drops it and it's screwed.

Moving all usb_disable_interface() calls before their corresponding
usb_hcd_alloc_bandwidth() would meet xhci expectations, but IDK if
it would work in general. As far as I see in usb_set_interface() for
example, the control request is only done after successful bandwidth
allocation and the interface swap only after a successful request.

My patch addresses the problem from xhci side, by adapting to core
expectations. As far as I see, only xhci_endpoint_reset() is broken
by this add_endpoint() -> endpoint_disable() sequence, so I fix it
by removing the dependence on ep->hcpriv. Alternatively, we could
stop clearing ep->hcpriv, but I'm not sure if it's reliable. No HCD
depends on ep->hcpriv being preserved after endpoint_disable().

I will do a v2 because Mathias expressed interest in this patch for
cleanup's sake alone, but the kernel test robot found build issues.

Regards,
Michal
diff mbox series

Patch

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index a63c793bac21..d2433807a397 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1986,7 +1986,7 @@  void usb_hcd_reset_endpoint(struct usb_device *udev,
 	struct usb_hcd *hcd = bus_to_hcd(udev->bus);
 
 	if (hcd->driver->endpoint_reset)
-		hcd->driver->endpoint_reset(hcd, ep);
+		hcd->driver->endpoint_reset(hcd, udev, ep);
 	else {
 		int epnum = usb_endpoint_num(&ep->desc);
 		int is_out = usb_endpoint_dir_out(&ep->desc);
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 6d1d190c914d..813cdedb14ab 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -1044,7 +1044,8 @@  ehci_endpoint_disable (struct usb_hcd *hcd, struct usb_host_endpoint *ep)
 }
 
 static void
-ehci_endpoint_reset(struct usb_hcd *hcd, struct usb_host_endpoint *ep)
+ehci_endpoint_reset(struct usb_hcd *hcd, struct usb_device *udev,
+		    struct usb_host_endpoint *ep)
 {
 	struct ehci_hcd		*ehci = hcd_to_ehci(hcd);
 	struct ehci_qh		*qh;
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 0452b8d65832..5bf89ba7e2b8 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -3161,11 +3161,10 @@  static void xhci_endpoint_disable(struct usb_hcd *hcd,
  * resume. A new vdev will be allocated later by xhci_discover_or_reset_device()
  */
 
-static void xhci_endpoint_reset(struct usb_hcd *hcd,
+static void xhci_endpoint_reset(struct usb_hcd *hcd, struct usb_device *udev,
 		struct usb_host_endpoint *host_ep)
 {
 	struct xhci_hcd *xhci;
-	struct usb_device *udev;
 	struct xhci_virt_device *vdev;
 	struct xhci_virt_ep *ep;
 	struct xhci_input_control_ctx *ctrl_ctx;
@@ -3175,7 +3174,12 @@  static void xhci_endpoint_reset(struct usb_hcd *hcd,
 	u32 ep_flag;
 	int err;
 
+	err = xhci_check_args(hcd, udev, host_ep, 1, true, __func__);
+	if (err <= 0)
+		return;
+
 	xhci = hcd_to_xhci(hcd);
+	vdev = xhci->devs[udev->slot_id];
 	ep_index = xhci_get_endpoint_index(&host_ep->desc);
 
 	/*
@@ -3185,28 +3189,13 @@  static void xhci_endpoint_reset(struct usb_hcd *hcd,
 	 */
 	if (usb_endpoint_xfer_control(&host_ep->desc) && ep_index == 0) {
 
-		udev = container_of(host_ep, struct usb_device, ep0);
-		if (udev->speed != USB_SPEED_FULL || !udev->slot_id)
-			return;
-
-		vdev = xhci->devs[udev->slot_id];
-		if (!vdev || vdev->udev != udev)
-			return;
-
-		xhci_check_ep0_maxpacket(xhci, vdev);
+		if (udev->speed == USB_SPEED_FULL)
+			xhci_check_ep0_maxpacket(xhci, vdev);
 
 		/* Nothing else should be done here for ep0 during ep reset */
 		return;
 	}
 
-	if (!host_ep->hcpriv)
-		return;
-	udev = (struct usb_device *) host_ep->hcpriv;
-	vdev = xhci->devs[udev->slot_id];
-
-	if (!udev->slot_id || !vdev)
-		return;
-
 	ep = &vdev->eps[ep_index];
 
 	spin_lock_irqsave(&xhci->lock, flags);
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index ac95e7c89df5..179c85337eff 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -304,7 +304,7 @@  struct hc_driver {
 
 	/* (optional) reset any endpoint state such as sequence number
 	   and current window */
-	void	(*endpoint_reset)(struct usb_hcd *hcd,
+	void	(*endpoint_reset)(struct usb_hcd *hcd, struct usb_device *udev,
 			struct usb_host_endpoint *ep);
 
 	/* root hub support */