diff mbox series

xhci: use iopoll for xhci_handshake

Message ID 20190227221917.62567-1-djkurtz@chromium.org (mailing list archive)
State New, archived
Headers show
Series xhci: use iopoll for xhci_handshake | expand

Commit Message

Daniel Kurtz Feb. 27, 2019, 10:19 p.m. UTC
In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
a spin lock held (and local interrupts disabled) with a huge 5 second
timeout.  This can translates to 5 million calls to udelay(1).  By its
very nature, udelay() is not meant to be precise, it only guarantees to
delay a minimum of 1 microsecond. Therefore the actual delay of
xhci_handshake() can be significantly longer.  If the average udelay(1)
is greater than 2.2 us, the total time in xhci_handshake() - with
interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
detector.

To avoid this, let's replace the open coded io polling loop with one from
iopoll.h that uses a loop timed with the more presumably reliable ktime
infrastructure.

Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
---
 drivers/usb/host/xhci.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

Comments

Greg KH Feb. 28, 2019, 7:09 a.m. UTC | #1
On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
> In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
> a spin lock held (and local interrupts disabled) with a huge 5 second
> timeout.  This can translates to 5 million calls to udelay(1).  By its
> very nature, udelay() is not meant to be precise, it only guarantees to
> delay a minimum of 1 microsecond. Therefore the actual delay of
> xhci_handshake() can be significantly longer.  If the average udelay(1)
> is greater than 2.2 us, the total time in xhci_handshake() - with
> interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
> detector.
> 
> To avoid this, let's replace the open coded io polling loop with one from
> iopoll.h that uses a loop timed with the more presumably reliable ktime
> infrastructure.
> 
> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>

Looks sane to me, nice fixup.

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Is this causing problems on older kernels/devices today such that we
should backport this?

thanks,

greg k-h
Mathias Nyman Feb. 28, 2019, 8:55 a.m. UTC | #2
On 28.2.2019 9.09, Greg Kroah-Hartman wrote:
> On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
>> In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
>> a spin lock held (and local interrupts disabled) with a huge 5 second
>> timeout.  This can translates to 5 million calls to udelay(1).  By its
>> very nature, udelay() is not meant to be precise, it only guarantees to
>> delay a minimum of 1 microsecond. Therefore the actual delay of
>> xhci_handshake() can be significantly longer.  If the average udelay(1)
>> is greater than 2.2 us, the total time in xhci_handshake() - with
>> interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
>> detector.
>>
>> To avoid this, let's replace the open coded io polling loop with one from
>> iopoll.h that uses a loop timed with the more presumably reliable ktime
>> infrastructure.
>>
>> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
> 
> Looks sane to me, nice fixup.
> 
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> 
> Is this causing problems on older kernels/devices today such that we
> should backport this?
> 

A very similar patch was submitted some weeks ago by Andrey Smirnov.

https://lore.kernel.org/lkml/20190208014816.21869-1-andrew.smirnov@gmail.com/

His commit message only mentions that readl_poll_timeout_atomic() does the same job,
not about any issues with the loop,  so I was going to send it forward to usb-next
after 5.1-rc (to 5.2).

-Mathias
Daniel Kurtz Feb. 28, 2019, 4:49 p.m. UTC | #3
On Thu, Feb 28, 2019 at 12:09 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote:
> > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with
> > a spin lock held (and local interrupts disabled) with a huge 5 second
> > timeout.  This can translates to 5 million calls to udelay(1).  By its
> > very nature, udelay() is not meant to be precise, it only guarantees to
> > delay a minimum of 1 microsecond. Therefore the actual delay of
> > xhci_handshake() can be significantly longer.  If the average udelay(1)
> > is greater than 2.2 us, the total time in xhci_handshake() - with
> > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup
> > detector.
> >
> > To avoid this, let's replace the open coded io polling loop with one from
> > iopoll.h that uses a loop timed with the more presumably reliable ktime
> > infrastructure.
> >
> > Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
>
> Looks sane to me, nice fixup.
>
> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>
> Is this causing problems on older kernels/devices today such that we
> should backport this?

We detected that xhci_handshake timing out can lead to softlockup
while debugging a USB issue on a new product.  The xhci_handshake
timeout itself is a symptom of another underlying problem causing some
commands to be aborted.  I don't know if any such underlying problems
exist on other older devices, but the potential is there so a backport
is reasonable.  Although, it may just shift the symptom of an
underlying problem from a softlockup/oops to some other symptom, like
USB just being dead.

-Dan

>
> thanks,
>
> greg k-h
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 005e65922608e..fde5ff84ba69b 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -16,6 +16,7 @@ 
 #include <linux/slab.h>
 #include <linux/dmi.h>
 #include <linux/dma-mapping.h>
+#include <linux/iopoll.h>
 
 #include "xhci.h"
 #include "xhci-trace.h"
@@ -69,18 +70,18 @@  static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring)
 int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec)
 {
 	u32	result;
+	int ret;
 
-	do {
-		result = readl(ptr);
-		if (result == ~(u32)0)		/* card removed */
-			return -ENODEV;
-		result &= mask;
-		if (result == done)
-			return 0;
-		udelay(1);
-		usec--;
-	} while (usec > 0);
-	return -ETIMEDOUT;
+	ret = readl_poll_timeout_atomic(ptr, result,
+			((result & mask) == done || (result == ~(u32)0)),
+			1, usec);
+	if (ret)
+		return ret;
+
+	if (result == ~(u32)0)		/* card removed */
+		return -ENODEV;
+
+	return 0;
 }
 
 /*