diff mbox

xHCI problem? [was Re: Erratic USB device behavior and device loss]

Message ID Pine.LNX.4.44L0.1609161729340.1657-100000@iolanthe.rowland.org
State New, archived
Headers show

Commit Message

Alan Stern Sept. 16, 2016, 9:40 p.m. UTC
On Fri, 16 Sep 2016, Ritesh Raj Sarraf wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Hello Ulf and Alan,
> 
> On Thu, 2016-09-15 at 10:16 -0400, Alan Stern wrote:
> > > ---
> > >  drivers/mmc/host/rtsx_usb_sdmmc.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/drivers/mmc/host/rtsx_usb_sdmmc.c
> > > b/drivers/mmc/host/rtsx_usb_sdmmc.c
> > > index 6c71fc9..a59c7fa 100644
> > > --- a/drivers/mmc/host/rtsx_usb_sdmmc.c
> > > +++ b/drivers/mmc/host/rtsx_usb_sdmmc.c
> > > @@ -1314,6 +1314,7 @@ static void rtsx_usb_update_led(struct work_struct
> > *work)
> > >                 container_of(work, struct rtsx_usb_sdmmc, led_work);
> > >         struct rtsx_ucr *ucr = host->ucr;
> > > 
> > > +       pm_runtime_get_sync(sdmmc_dev(host));
> > >         mutex_lock(&ucr->dev_mutex);
> > > 
> > >         if (host->led.brightness == LED_OFF)
> > > @@ -1322,6 +1323,7 @@ static void rtsx_usb_update_led(struct work_struct
> > *work)
> > >                 rtsx_usb_turn_on_led(ucr);
> > > 
> > >         mutex_unlock(&ucr->dev_mutex);
> > > +       pm_runtime_put(sdmmc_dev(host));
> > >  }
> > >  #endif
> > > 
> > > -- 
> > > 
> > > Although, I doubt the above is the main reason to the issues we see.
> > 
> > I don't know -- it could well be the reason.  The symptoms are 
> > definitely what you would expect to see if some thread was doing I/O 
> > without calling the pm_runtime_* routines.

> I was able to hit it again. Please find the usbmon trace at:
> https://people.debian.org/~rrs/tmp/usb-4.8.0-rc6ulf1+.log.gz

 
We're still getting runtime suspends, but now at 2-second intervals.   
This is partly because the driver isn't calling
pm_runtime_mark_last_busy(), but there may be more to it.  The 2-second 
period is the default autosuspend timeout for USB devices.  However, I 
don't see the activity that rtsx_usb_get_card_status() should produce 
when rtsx_usb_suspend() runs; I don't know why not.

We're also getting occasional I/O attempts while the device is
suspended.  They must be on some other pathway, not the one fixed by
the patch above.  Let's see if we can find out just where they come
from.

Ritesh, please try applying this patch on top of the previous one.  It 
will produce output in the kernel log whenever these bad I/O attempts 
occur.  Also, enable dynamic debugging for the rtsx_usb driver:

	echo 'module rtsx_usb =p' >/sys/kernel/debug/dynamic_debug/control

before starting the test.  (You may need to mount a debugfs filesystem 
on /sys/kernel/debug first.)

Alan Stern





--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ritesh Raj Sarraf Sept. 17, 2016, 11:42 a.m. UTC | #1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Alan,


On Fri, 2016-09-16 at 17:40 -0400, Alan Stern wrote:
> We're still getting runtime suspends, but now at 2-second intervals.   
> This is partly because the driver isn't calling
> pm_runtime_mark_last_busy(), but there may be more to it.  The 2-second 
> period is the default autosuspend timeout for USB devices.  However, I 
> don't see the activity that rtsx_usb_get_card_status() should produce 
> when rtsx_usb_suspend() runs; I don't know why not.
> 
> We're also getting occasional I/O attempts while the device is
> suspended.  They must be on some other pathway, not the one fixed by
> the patch above.  Let's see if we can find out just where they come
> from.
> 
> Ritesh, please try applying this patch on top of the previous one.  It 
> will produce output in the kernel log whenever these bad I/O attempts 
> occur.  Also, enable dynamic debugging for the rtsx_usb driver:
> 

Please find links to the usbmon trace and the kernel trace.

https://people.debian.org/~rrs/tmp/4.8.0-rc6ulf1alan1+.kern.log
https://people.debian.org/~rrs/tmp/usb-4.8.0-rc6ulf1alan1+.log.gz

Thanks.

>         echo 'module rtsx_usb =p' >/sys/kernel/debug/dynamic_debug/control
> 
> before starting the test.  (You may need to mount a debugfs filesystem 
> on /sys/kernel/debug first.)
> 
> Alan Stern
> 
> 
> 
> Index: usb-4.x/drivers/usb/core/hcd.c
> ===================================================================
> --- usb-4.x.orig/drivers/usb/core/hcd.c
> +++ usb-4.x/drivers/usb/core/hcd.c
> @@ -1647,6 +1647,8 @@ int usb_hcd_submit_urb (struct urb *urb,
>                 status = map_urb_for_dma(hcd, urb, mem_flags);
>                 if (likely(status == 0)) {
>                         status = hcd->driver->urb_enqueue(hcd, urb,
> mem_flags);
> +                       if (status == -EHOSTUNREACH)
> +                               dump_stack();
>                         if (unlikely(status))
>                                 unmap_urb_for_dma(hcd, urb);
>                 }
> 
- -- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX3SwNAAoJEKY6WKPy4XVpMaYQAK3p/meduq2SLRKjcLuher/+
U6W0+6t1MJmNLZgArqEYLprGQs8dboDUVuYdOkpDyjsL3oRVc2RFRhKP4n5uyeqf
UOyyJC/Dn8JpW5abQPdQOi8/zeY019P1MPKd/lAvjs+MXOdRvOluwne3KGeVJzrc
nWNt9YMZCvxscbXjJVqNWFh8utg6BVVoJ72sqkHYL6N+cWKwKb4QphgNXbhoPQq3
K7KwsBywQHty/Wf4TXB6n8z/6zR6uNHQjaveboUvidkMhWXYFSVag6ba6ZOsAyoL
nPWvowTTKO3snKh2AzHhJEzgky2EXMoxG5+GzA3nBA9suAapxuS7tTSt5YAShupv
at1FASeb8kyZ7vt3Srq7WQN7OIlER9zvVvapENJHwqATnHAQ+35h+7o1CrZLjDyF
UB5qKWOOQlFmiEfjjs15bLumBQqA4vX7JvYqJxNX7AtOPUXZHvs85eX7S4s18TZ7
OYO5JkS0xcLD6HMvuxzwO+UDS5DYknwma8gAPvHg1mX8QQMOPNkbnd7Igt9kjhMR
ZioeP0xLyyUGwjGErQpzgQwHlk5bMQhQ4iiOxi4nz4aCWLCs2YwzojfwtfDxSYnV
rCSjWEAanG3eRP7iPBtbdrbFmWsjSsavopXXlLPNYDjBjnpm2/IcyIlC/dTKKYJz
0rkc0hreSRuSDMPxbkfV
=d/iL
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Sept. 18, 2016, 1:42 a.m. UTC | #2
On Sat, 17 Sep 2016, Ritesh Raj Sarraf wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Hello Alan,
> 
> 
> On Fri, 2016-09-16 at 17:40 -0400, Alan Stern wrote:
> > We're still getting runtime suspends, but now at 2-second intervals.   
> > This is partly because the driver isn't calling
> > pm_runtime_mark_last_busy(), but there may be more to it.  The 2-second 
> > period is the default autosuspend timeout for USB devices.  However, I 
> > don't see the activity that rtsx_usb_get_card_status() should produce 
> > when rtsx_usb_suspend() runs; I don't know why not.
> > 
> > We're also getting occasional I/O attempts while the device is
> > suspended.  They must be on some other pathway, not the one fixed by
> > the patch above.  Let's see if we can find out just where they come
> > from.
> > 
> > Ritesh, please try applying this patch on top of the previous one.  It 
> > will produce output in the kernel log whenever these bad I/O attempts 
> > occur.  Also, enable dynamic debugging for the rtsx_usb driver:
> > 
> 
> Please find links to the usbmon trace and the kernel trace.
> 
> https://people.debian.org/~rrs/tmp/4.8.0-rc6ulf1alan1+.kern.log
> https://people.debian.org/~rrs/tmp/usb-4.8.0-rc6ulf1alan1+.log.gz

Well, this is pretty clear:

Sep 17 15:55:52 learner kernel: CPU: 1 PID: 535 Comm: rtsx_usb_ms_1 Tainted: G     U          4.8.0-rc6ulf1alan1+ #19
Sep 17 15:55:52 learner kernel: Hardware name: LENOVO 20344/INVALID, BIOS 96CN31WW(V1.17) 07/21/2015
Sep 17 15:55:52 learner kernel:  0000000000000000 ffffffff81314be5 ffff8802476746c0 0000000002400000
Sep 17 15:55:52 learner kernel:  ffffffffa016f719 00000000523bec00 ffff88025f255780 ffff88024feff600
Sep 17 15:55:52 learner kernel:  0000000000018080 0000000000000000 ffff88025f258080 ffffffff815a0e60
Sep 17 15:55:52 learner kernel: Call Trace:
Sep 17 15:55:52 learner kernel:  [<ffffffff81314be5>] ? dump_stack+0x7d/0xb8
Sep 17 15:55:52 learner kernel:  [<ffffffffa016f719>] ? usb_hcd_submit_urb+0x3c9/0xad0 [usbcore]
Sep 17 15:55:52 learner kernel:  [<ffffffff815a0e60>] ? _raw_spin_lock_irqsave+0x20/0x47
Sep 17 15:55:52 learner kernel:  [<ffffffff810d5c8b>] ? lock_timer_base.isra.24+0x7b/0xa0
Sep 17 15:55:52 learner kernel:  [<ffffffff810d5d59>] ? try_to_del_timer_sync+0x49/0x60
Sep 17 15:55:52 learner kernel:  [<ffffffffa017180d>] ? usb_start_wait_urb+0x5d/0x140 [usbcore]
Sep 17 15:55:52 learner kernel:  [<ffffffffa00ee2be>] ? rtsx_usb_send_cmd+0x5e/0x80 [rtsx_usb]
Sep 17 15:55:52 learner kernel:  [<ffffffffa00ee4a7>] ? rtsx_usb_read_register+0x67/0xb0 [rtsx_usb]
Sep 17 15:55:52 learner kernel:  [<ffffffffa0b15ac1>] ? rtsx_usb_detect_ms_card+0x61/0xe0 [rtsx_usb_ms]
Sep 17 15:55:52 learner kernel:  [<ffffffffa0b15a60>] ? rtsx_usb_ms_set_param+0x770/0x770 [rtsx_usb_ms]
Sep 17 15:55:52 learner kernel:  [<ffffffff8108ee0d>] ? kthread+0xbd/0xe0
Sep 17 15:55:52 learner kernel:  [<ffffffff81024741>] ? __switch_to+0x2b1/0x6a0
Sep 17 15:55:52 learner kernel:  [<ffffffff815a118f>] ? ret_from_fork+0x1f/0x40
Sep 17 15:55:52 learner kernel:  [<ffffffff8108ed50>] ? kthread_create_on_node+0x180/0x180

This is the rtsx_usb_detect_ms_card() routine in
drivers/memstick/host/rtsx_usb_ms.c, which runs as a kthread.  It 
doesn't do any runtime PM.  So it looks like the bug is present in both 
the MMC and MemoryStick interfaces.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Sept. 19, 2016, 10:10 a.m. UTC | #3
On 18 September 2016 at 03:42, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Sat, 17 Sep 2016, Ritesh Raj Sarraf wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA512
>>
>> Hello Alan,
>>
>>
>> On Fri, 2016-09-16 at 17:40 -0400, Alan Stern wrote:
>> > We're still getting runtime suspends, but now at 2-second intervals.
>> > This is partly because the driver isn't calling
>> > pm_runtime_mark_last_busy(), but there may be more to it.  The 2-second
>> > period is the default autosuspend timeout for USB devices.  However, I
>> > don't see the activity that rtsx_usb_get_card_status() should produce
>> > when rtsx_usb_suspend() runs; I don't know why not.
>> >
>> > We're also getting occasional I/O attempts while the device is
>> > suspended.  They must be on some other pathway, not the one fixed by
>> > the patch above.  Let's see if we can find out just where they come
>> > from.
>> >
>> > Ritesh, please try applying this patch on top of the previous one.  It
>> > will produce output in the kernel log whenever these bad I/O attempts
>> > occur.  Also, enable dynamic debugging for the rtsx_usb driver:
>> >
>>
>> Please find links to the usbmon trace and the kernel trace.
>>
>> https://people.debian.org/~rrs/tmp/4.8.0-rc6ulf1alan1+.kern.log
>> https://people.debian.org/~rrs/tmp/usb-4.8.0-rc6ulf1alan1+.log.gz
>
> Well, this is pretty clear:
>
> Sep 17 15:55:52 learner kernel: CPU: 1 PID: 535 Comm: rtsx_usb_ms_1 Tainted: G     U          4.8.0-rc6ulf1alan1+ #19
> Sep 17 15:55:52 learner kernel: Hardware name: LENOVO 20344/INVALID, BIOS 96CN31WW(V1.17) 07/21/2015
> Sep 17 15:55:52 learner kernel:  0000000000000000 ffffffff81314be5 ffff8802476746c0 0000000002400000
> Sep 17 15:55:52 learner kernel:  ffffffffa016f719 00000000523bec00 ffff88025f255780 ffff88024feff600
> Sep 17 15:55:52 learner kernel:  0000000000018080 0000000000000000 ffff88025f258080 ffffffff815a0e60
> Sep 17 15:55:52 learner kernel: Call Trace:
> Sep 17 15:55:52 learner kernel:  [<ffffffff81314be5>] ? dump_stack+0x7d/0xb8
> Sep 17 15:55:52 learner kernel:  [<ffffffffa016f719>] ? usb_hcd_submit_urb+0x3c9/0xad0 [usbcore]
> Sep 17 15:55:52 learner kernel:  [<ffffffff815a0e60>] ? _raw_spin_lock_irqsave+0x20/0x47
> Sep 17 15:55:52 learner kernel:  [<ffffffff810d5c8b>] ? lock_timer_base.isra.24+0x7b/0xa0
> Sep 17 15:55:52 learner kernel:  [<ffffffff810d5d59>] ? try_to_del_timer_sync+0x49/0x60
> Sep 17 15:55:52 learner kernel:  [<ffffffffa017180d>] ? usb_start_wait_urb+0x5d/0x140 [usbcore]
> Sep 17 15:55:52 learner kernel:  [<ffffffffa00ee2be>] ? rtsx_usb_send_cmd+0x5e/0x80 [rtsx_usb]
> Sep 17 15:55:52 learner kernel:  [<ffffffffa00ee4a7>] ? rtsx_usb_read_register+0x67/0xb0 [rtsx_usb]
> Sep 17 15:55:52 learner kernel:  [<ffffffffa0b15ac1>] ? rtsx_usb_detect_ms_card+0x61/0xe0 [rtsx_usb_ms]
> Sep 17 15:55:52 learner kernel:  [<ffffffffa0b15a60>] ? rtsx_usb_ms_set_param+0x770/0x770 [rtsx_usb_ms]
> Sep 17 15:55:52 learner kernel:  [<ffffffff8108ee0d>] ? kthread+0xbd/0xe0
> Sep 17 15:55:52 learner kernel:  [<ffffffff81024741>] ? __switch_to+0x2b1/0x6a0
> Sep 17 15:55:52 learner kernel:  [<ffffffff815a118f>] ? ret_from_fork+0x1f/0x40
> Sep 17 15:55:52 learner kernel:  [<ffffffff8108ed50>] ? kthread_create_on_node+0x180/0x180
>
> This is the rtsx_usb_detect_ms_card() routine in
> drivers/memstick/host/rtsx_usb_ms.c, which runs as a kthread.  It
> doesn't do any runtime PM.  So it looks like the bug is present in both
> the MMC and MemoryStick interfaces.

I think the problem is even worse in the MemoryStick case, as the
memstick core doesn't help with runtime PM. I am pretty sure there are
other cases when the MemoryStick driver accesses the usb device
without first runtime resuming it.

Of course we could start simple an fix the bug observed above and see
if that solves the reported problem. Alan, do you want to post to
patch or you want me?

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: usb-4.x/drivers/usb/core/hcd.c
===================================================================
--- usb-4.x.orig/drivers/usb/core/hcd.c
+++ usb-4.x/drivers/usb/core/hcd.c
@@ -1647,6 +1647,8 @@  int usb_hcd_submit_urb (struct urb *urb,
 		status = map_urb_for_dma(hcd, urb, mem_flags);
 		if (likely(status == 0)) {
 			status = hcd->driver->urb_enqueue(hcd, urb, mem_flags);
+			if (status == -EHOSTUNREACH)
+				dump_stack();
 			if (unlikely(status))
 				unmap_urb_for_dma(hcd, urb);
 		}