Message ID | 20190227221917.62567-1-djkurtz@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | xhci: use iopoll for xhci_handshake | expand |
On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote: > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with > a spin lock held (and local interrupts disabled) with a huge 5 second > timeout. This can translates to 5 million calls to udelay(1). By its > very nature, udelay() is not meant to be precise, it only guarantees to > delay a minimum of 1 microsecond. Therefore the actual delay of > xhci_handshake() can be significantly longer. If the average udelay(1) > is greater than 2.2 us, the total time in xhci_handshake() - with > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup > detector. > > To avoid this, let's replace the open coded io polling loop with one from > iopoll.h that uses a loop timed with the more presumably reliable ktime > infrastructure. > > Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Looks sane to me, nice fixup. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Is this causing problems on older kernels/devices today such that we should backport this? thanks, greg k-h
On 28.2.2019 9.09, Greg Kroah-Hartman wrote: > On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote: >> In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with >> a spin lock held (and local interrupts disabled) with a huge 5 second >> timeout. This can translates to 5 million calls to udelay(1). By its >> very nature, udelay() is not meant to be precise, it only guarantees to >> delay a minimum of 1 microsecond. Therefore the actual delay of >> xhci_handshake() can be significantly longer. If the average udelay(1) >> is greater than 2.2 us, the total time in xhci_handshake() - with >> interrupts disabled can be > 11 seconds triggering the kernel's soft lockup >> detector. >> >> To avoid this, let's replace the open coded io polling loop with one from >> iopoll.h that uses a loop timed with the more presumably reliable ktime >> infrastructure. >> >> Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> > > Looks sane to me, nice fixup. > > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Is this causing problems on older kernels/devices today such that we > should backport this? > A very similar patch was submitted some weeks ago by Andrey Smirnov. https://lore.kernel.org/lkml/20190208014816.21869-1-andrew.smirnov@gmail.com/ His commit message only mentions that readl_poll_timeout_atomic() does the same job, not about any issues with the loop, so I was going to send it forward to usb-next after 5.1-rc (to 5.2). -Mathias
On Thu, Feb 28, 2019 at 12:09 AM Greg Kroah-Hartman <gregkh@linuxfoundation.org> wrote: > > On Wed, Feb 27, 2019 at 03:19:17PM -0700, Daniel Kurtz wrote: > > In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with > > a spin lock held (and local interrupts disabled) with a huge 5 second > > timeout. This can translates to 5 million calls to udelay(1). By its > > very nature, udelay() is not meant to be precise, it only guarantees to > > delay a minimum of 1 microsecond. Therefore the actual delay of > > xhci_handshake() can be significantly longer. If the average udelay(1) > > is greater than 2.2 us, the total time in xhci_handshake() - with > > interrupts disabled can be > 11 seconds triggering the kernel's soft lockup > > detector. > > > > To avoid this, let's replace the open coded io polling loop with one from > > iopoll.h that uses a loop timed with the more presumably reliable ktime > > infrastructure. > > > > Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> > > Looks sane to me, nice fixup. > > Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > > Is this causing problems on older kernels/devices today such that we > should backport this? We detected that xhci_handshake timing out can lead to softlockup while debugging a USB issue on a new product. The xhci_handshake timeout itself is a symptom of another underlying problem causing some commands to be aborted. I don't know if any such underlying problems exist on other older devices, but the potential is there so a backport is reasonable. Although, it may just shift the symptom of an underlying problem from a softlockup/oops to some other symptom, like USB just being dead. -Dan > > thanks, > > greg k-h
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 005e65922608e..fde5ff84ba69b 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -16,6 +16,7 @@ #include <linux/slab.h> #include <linux/dmi.h> #include <linux/dma-mapping.h> +#include <linux/iopoll.h> #include "xhci.h" #include "xhci-trace.h" @@ -69,18 +70,18 @@ static bool td_on_ring(struct xhci_td *td, struct xhci_ring *ring) int xhci_handshake(void __iomem *ptr, u32 mask, u32 done, int usec) { u32 result; + int ret; - do { - result = readl(ptr); - if (result == ~(u32)0) /* card removed */ - return -ENODEV; - result &= mask; - if (result == done) - return 0; - udelay(1); - usec--; - } while (usec > 0); - return -ETIMEDOUT; + ret = readl_poll_timeout_atomic(ptr, result, + ((result & mask) == done || (result == ~(u32)0)), + 1, usec); + if (ret) + return ret; + + if (result == ~(u32)0) /* card removed */ + return -ENODEV; + + return 0; } /*
In cases such as xhci_abort_cmd_ring(), xhci_handshake() is called with a spin lock held (and local interrupts disabled) with a huge 5 second timeout. This can translates to 5 million calls to udelay(1). By its very nature, udelay() is not meant to be precise, it only guarantees to delay a minimum of 1 microsecond. Therefore the actual delay of xhci_handshake() can be significantly longer. If the average udelay(1) is greater than 2.2 us, the total time in xhci_handshake() - with interrupts disabled can be > 11 seconds triggering the kernel's soft lockup detector. To avoid this, let's replace the open coded io polling loop with one from iopoll.h that uses a loop timed with the more presumably reliable ktime infrastructure. Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> --- drivers/usb/host/xhci.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-)