diff mbox series

[2/2] usb: ehci: unlink itd/sitds from hardware list if the controller has stopped periodic schedule

Message ID 20230718112600.3969141-2-xu.yang_2@nxp.com (mailing list archive)
State New, archived
Headers show
Series [1/2] usb: host: ehci-sched: add itd/sitd_unlink_urb() functions | expand

Commit Message

Xu Yang July 18, 2023, 11:26 a.m. UTC
In current design, the ehci driver will not unlink itd/sitds from the
hardware list when dequeue isochronous urbs. Rather just wait until they
complete normally or their time slot expires. However, this will cause
issues if the controller has stopped periodic schedule before finished
all periodic schedule. The urb will not be done forever in this case and
then usb_kill/poison_urb() will always wait there.

The ChipIdea IP exactly has a bug: if frame babble occurs during periodic
transfer, PE (PORTSC.bit2) will be cleared and the controller will stop
periodic schedule immediately. So if the user tries to kill or poison
related urb, it will wait there since the urb can't be done forever.

This patch will check if this issue occurs, then it will unlink itd/sitds
from the hardware list depends on the result.

Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
---
 drivers/usb/host/ehci-hcd.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

Comments

Alan Stern July 18, 2023, 2:53 p.m. UTC | #1
On Tue, Jul 18, 2023 at 07:26:00PM +0800, Xu Yang wrote:
> In current design, the ehci driver will not unlink itd/sitds from the
> hardware list when dequeue isochronous urbs. Rather just wait until they
> complete normally or their time slot expires. However, this will cause
> issues if the controller has stopped periodic schedule before finished
> all periodic schedule. The urb will not be done forever in this case and
> then usb_kill/poison_urb() will always wait there.
> 
> The ChipIdea IP exactly has a bug: if frame babble occurs during periodic
> transfer, PE (PORTSC.bit2) will be cleared and the controller will stop
> periodic schedule immediately. So if the user tries to kill or poison
> related urb, it will wait there since the urb can't be done forever.
> 
> This patch will check if this issue occurs, then it will unlink itd/sitds
> from the hardware list depends on the result.

This is not the right approach.  There already is code in the driver for 
unlinking itds/sitds when the periodic schedule isn't running: See how 
the "live" variable is used in scan_isoc().  You don't need to add new 
code to do the same thing, you simply have to make sure that live is set 
to false if the controller has stopped the periodic schedule 
unexpectedly.

(Be very careful about handling the case where CMD_PSE is set and 
STS_PSS is clear.  This can happen when the controller has been told to 
start the periodic schedule but it hasn't done so yet.)

Alan Stern
Xu Yang July 29, 2023, 6:55 a.m. UTC | #2
Hi Alan,

> On Tue, Jul 18, 2023 at 07:26:00PM +0800, Xu Yang wrote:
> > In current design, the ehci driver will not unlink itd/sitds from the
> > hardware list when dequeue isochronous urbs. Rather just wait until they
> > complete normally or their time slot expires. However, this will cause
> > issues if the controller has stopped periodic schedule before finished
> > all periodic schedule. The urb will not be done forever in this case and
> > then usb_kill/poison_urb() will always wait there.
> >
> > The ChipIdea IP exactly has a bug: if frame babble occurs during periodic
> > transfer, PE (PORTSC.bit2) will be cleared and the controller will stop
> > periodic schedule immediately. So if the user tries to kill or poison
> > related urb, it will wait there since the urb can't be done forever.
> >
> > This patch will check if this issue occurs, then it will unlink itd/sitds
> > from the hardware list depends on the result.
> 
> This is not the right approach.  There already is code in the driver for
> unlinking itds/sitds when the periodic schedule isn't running: See how
> the "live" variable is used in scan_isoc().  You don't need to add new
> code to do the same thing, you simply have to make sure that live is set
> to false if the controller has stopped the periodic schedule
> unexpectedly.
> 
> (Be very careful about handling the case where CMD_PSE is set and
> STS_PSS is clear.  This can happen when the controller has been told to
> start the periodic schedule but it hasn't done so yet.)

Many thanks for your comments and suggestions.

Yes, there is a "live" variable in scan_isoc() to handle the case where
root hub has stopped periodic schedule. I have rechecked the root cause
of the issue , that is the USB controller has disabled the port (PE cleared)
and asserted USBERRINT when frame babble is detected, but PEC is not
asserted. Therefore, the SW didn't aware that port has been disabled. 
The periodic schedule keeps running at this moment. Then the SW keeps
sending packets to this port, but all of the transfers will fail. But periodic
schedule will also be disabled after a period of time. Finally, all of the linked
itds are penging there. The code can't enter into scan_isoc() if only USBERRINT
is asserted.

For this issue, I think the SW needs to aware that the port has been disabled
although PEC not asserted by HW. I will send another patch to fix this issue
later.

Thanks,
Xu Yang 

> 
> Alan Stern
Alan Stern July 29, 2023, 2:53 p.m. UTC | #3
On Sat, Jul 29, 2023 at 06:55:07AM +0000, Xu Yang wrote:
> Many thanks for your comments and suggestions.
> 
> Yes, there is a "live" variable in scan_isoc() to handle the case where
> root hub has stopped periodic schedule. I have rechecked the root cause
> of the issue , that is the USB controller has disabled the port (PE cleared)
> and asserted USBERRINT when frame babble is detected, but PEC is not
> asserted. Therefore, the SW didn't aware that port has been disabled. 
> The periodic schedule keeps running at this moment. Then the SW keeps
> sending packets to this port, but all of the transfers will fail. But periodic
> schedule will also be disabled after a period of time. Finally, all of the linked
> itds are penging there. The code can't enter into scan_isoc() if only USBERRINT
> is asserted.

That's not true.  The io_watchdog timer continues to fire periodically 
(at 100 ms intervals) as long as isoc_count > 0.  The timer's handler is 
ehci_work(), which calls scan_isoc() if isoc_count > 0.

> For this issue, I think the SW needs to aware that the port has been disabled
> although PEC not asserted by HW. I will send another patch to fix this issue
> later.

Yes, that sounds like a nasty problem.  The kernel wouldn't realize 
that the device had disconnected.

Alan Stern
Xu Yang Aug. 8, 2023, 10:03 a.m. UTC | #4
Hi Alan,

> On Sat, Jul 29, 2023 at 06:55:07AM +0000, Xu Yang wrote:
> > Many thanks for your comments and suggestions.
> >
> > Yes, there is a "live" variable in scan_isoc() to handle the case where
> > root hub has stopped periodic schedule. I have rechecked the root cause
> > of the issue , that is the USB controller has disabled the port (PE cleared)
> > and asserted USBERRINT when frame babble is detected, but PEC is not
> > asserted. Therefore, the SW didn't aware that port has been disabled.
> > The periodic schedule keeps running at this moment. Then the SW keeps
> > sending packets to this port, but all of the transfers will fail. But periodic
> > schedule will also be disabled after a period of time. Finally, all of the linked
> > itds are penging there. The code can't enter into scan_isoc() if only USBERRINT
> > is asserted.
> 
> That's not true.  The io_watchdog timer continues to fire periodically
> (at 100 ms intervals) as long as isoc_count > 0.  The timer's handler is
> ehci_work(), which calls scan_isoc() if isoc_count > 0.

Yes, the io_watchdog timer will cleanup the isoc periodically as long as
isoc_count > 0. 

I did see all of the linked itds are pending there in my case at the end. After my
debug, I found the chipidea/host.c had set ehci->need_io_watchdog to 0 which
will have impact on turn_on_io_watchdog().

The host has enabled 1 intr endpoint and 2 isoc endpoints from USB camera.
Therefore, ehci->intr_count is always 1 and ehci->isoc_count is changing from
time to time during camera is running. When PE is cleared by HW, isoc_count
may be decreased to 0 after scan_isoc(). When turn_on_io_watchdog() is called,
EHCI_HRTIMER_IO_WATCHDOG event will not be enabled due to isoc_count=0
and need_io_watchdog=0 too. When isoc urb is submited again, enable_periodic()
will still not enable EHCI_HRTIMER_IO_WATCHDOG event due to periodic_count>0.
Then, the linked itds are pending there as long as intr urb has not been completed.

Thanks,
Xu Yang

> 
> > For this issue, I think the SW needs to aware that the port has been disabled
> > although PEC not asserted by HW. I will send another patch to fix this issue
> > later.
> 
> Yes, that sounds like a nasty problem.  The kernel wouldn't realize
> that the device had disconnected.
> 
> Alan Stern
Alan Stern Aug. 8, 2023, 3:13 p.m. UTC | #5
On Tue, Aug 08, 2023 at 10:03:39AM +0000, Xu Yang wrote:
> Hi Alan,
> 
> > On Sat, Jul 29, 2023 at 06:55:07AM +0000, Xu Yang wrote:
> > > Many thanks for your comments and suggestions.
> > >
> > > Yes, there is a "live" variable in scan_isoc() to handle the case where
> > > root hub has stopped periodic schedule. I have rechecked the root cause
> > > of the issue , that is the USB controller has disabled the port (PE cleared)
> > > and asserted USBERRINT when frame babble is detected, but PEC is not
> > > asserted. Therefore, the SW didn't aware that port has been disabled.
> > > The periodic schedule keeps running at this moment. Then the SW keeps
> > > sending packets to this port, but all of the transfers will fail. But periodic
> > > schedule will also be disabled after a period of time. Finally, all of the linked
> > > itds are penging there. The code can't enter into scan_isoc() if only USBERRINT
> > > is asserted.
> > 
> > That's not true.  The io_watchdog timer continues to fire periodically
> > (at 100 ms intervals) as long as isoc_count > 0.  The timer's handler is
> > ehci_work(), which calls scan_isoc() if isoc_count > 0.
> 
> Yes, the io_watchdog timer will cleanup the isoc periodically as long as
> isoc_count > 0. 
> 
> I did see all of the linked itds are pending there in my case at the end. After my
> debug, I found the chipidea/host.c had set ehci->need_io_watchdog to 0 which
> will have impact on turn_on_io_watchdog().
> 
> The host has enabled 1 intr endpoint and 2 isoc endpoints from USB camera.
> Therefore, ehci->intr_count is always 1 and ehci->isoc_count is changing from
> time to time during camera is running. When PE is cleared by HW, isoc_count
> may be decreased to 0 after scan_isoc(). When turn_on_io_watchdog() is called,
> EHCI_HRTIMER_IO_WATCHDOG event will not be enabled due to isoc_count=0
> and need_io_watchdog=0 too. When isoc urb is submited again, enable_periodic()
> will still not enable EHCI_HRTIMER_IO_WATCHDOG event due to periodic_count>0.

Ooh, that's a bug.  enable_periodic() should call turn_on_io_watchdog() 
no matter what value periodic_count has.  Would you like to fix it?

Alan Stern
Xu Yang Aug. 9, 2023, 2:06 a.m. UTC | #6
Hi Alan,

> On Tue, Aug 08, 2023 at 10:03:39AM +0000, Xu Yang wrote:
> > Hi Alan,
> >
> > > On Sat, Jul 29, 2023 at 06:55:07AM +0000, Xu Yang wrote:
> > > > Many thanks for your comments and suggestions.
> > > >
> > > > Yes, there is a "live" variable in scan_isoc() to handle the case where
> > > > root hub has stopped periodic schedule. I have rechecked the root cause
> > > > of the issue , that is the USB controller has disabled the port (PE cleared)
> > > > and asserted USBERRINT when frame babble is detected, but PEC is not
> > > > asserted. Therefore, the SW didn't aware that port has been disabled.
> > > > The periodic schedule keeps running at this moment. Then the SW keeps
> > > > sending packets to this port, but all of the transfers will fail. But periodic
> > > > schedule will also be disabled after a period of time. Finally, all of the linked
> > > > itds are penging there. The code can't enter into scan_isoc() if only USBERRINT
> > > > is asserted.
> > >
> > > That's not true.  The io_watchdog timer continues to fire periodically
> > > (at 100 ms intervals) as long as isoc_count > 0.  The timer's handler is
> > > ehci_work(), which calls scan_isoc() if isoc_count > 0.
> >
> > Yes, the io_watchdog timer will cleanup the isoc periodically as long as
> > isoc_count > 0.
> >
> > I did see all of the linked itds are pending there in my case at the end. After my
> > debug, I found the chipidea/host.c had set ehci->need_io_watchdog to 0 which
> > will have impact on turn_on_io_watchdog().
> >
> > The host has enabled 1 intr endpoint and 2 isoc endpoints from USB camera.
> > Therefore, ehci->intr_count is always 1 and ehci->isoc_count is changing from
> > time to time during camera is running. When PE is cleared by HW, isoc_count
> > may be decreased to 0 after scan_isoc(). When turn_on_io_watchdog() is called,
> > EHCI_HRTIMER_IO_WATCHDOG event will not be enabled due to isoc_count=0
> > and need_io_watchdog=0 too. When isoc urb is submited again, enable_periodic()
> > will still not enable EHCI_HRTIMER_IO_WATCHDOG event due to periodic_count>0.
> 
> Ooh, that's a bug.  enable_periodic() should call turn_on_io_watchdog()
> no matter what value periodic_count has.  Would you like to fix it?

Sure. Will try to fix it.

Thanks,
Xu Yang

> 
> Alan Stern
diff mbox series

Patch

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index a1930db0da1c..26dc1d1ae5e8 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -930,10 +930,41 @@  static int ehci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status)
 
 	if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS) {
 		/*
-		 * We don't expedite dequeue for isochronous URBs.
+		 * 1. We don't expedite dequeue for isochronous URBs.
 		 * Just wait until they complete normally or their
 		 * time slot expires.
+		 *
+		 * 2. The ChipIdea IP has a bug: if frame babble occurs,
+		 * PE will be cleared and the controller will stop periodic
+		 * schedule. So if we don't force dequeue this urb, it
+		 * won't be done forever. Here, a force dequeue is needed
+		 * for this case.
 		 */
+		unsigned 		i = HCS_N_PORTS (ehci->hcs_params);
+		bool 			need_force_dequeue = false;
+
+		while (i--) {
+			int pstatus;
+
+			pstatus = ehci_readl(ehci,
+					&ehci->regs->port_status[i]);
+
+			/* Any cleared PE means controller has stopped
+			 * periodic schedule.
+			 */
+			if (!(pstatus & PORT_PE)) {
+				need_force_dequeue = true;
+				break;
+			}
+		}
+
+		if (!need_force_dequeue)
+			goto done;
+
+		if (urb->dev->speed == USB_SPEED_HIGH)
+			itd_unlink_urb(ehci, urb);
+		else
+			sitd_unlink_urb(ehci, urb);
 	} else {
 		qh = (struct ehci_qh *) urb->hcpriv;
 		qh->unlink_reason |= QH_UNLINK_REQUESTED;