diff mbox

xHCI problem? [was Re: Erratic USB device behavior and device loss]

Message ID Pine.LNX.4.44L0.1609191340320.1458-100000@iolanthe.rowland.org
State New
Headers show

Commit Message

Alan Stern Sept. 19, 2016, 5:48 p.m. UTC
On Mon, 19 Sep 2016, Ulf Hansson wrote:

> On 18 September 2016 at 03:42, Alan Stern <stern@rowland.harvard.edu> wrote:

> > Well, this is pretty clear:
> >
> > Sep 17 15:55:52 learner kernel: CPU: 1 PID: 535 Comm: rtsx_usb_ms_1 Tainted: G     U          4.8.0-rc6ulf1alan1+ #19
> > Sep 17 15:55:52 learner kernel: Hardware name: LENOVO 20344/INVALID, BIOS 96CN31WW(V1.17) 07/21/2015
> > Sep 17 15:55:52 learner kernel:  0000000000000000 ffffffff81314be5 ffff8802476746c0 0000000002400000
> > Sep 17 15:55:52 learner kernel:  ffffffffa016f719 00000000523bec00 ffff88025f255780 ffff88024feff600
> > Sep 17 15:55:52 learner kernel:  0000000000018080 0000000000000000 ffff88025f258080 ffffffff815a0e60
> > Sep 17 15:55:52 learner kernel: Call Trace:
> > Sep 17 15:55:52 learner kernel:  [<ffffffff81314be5>] ? dump_stack+0x7d/0xb8
> > Sep 17 15:55:52 learner kernel:  [<ffffffffa016f719>] ? usb_hcd_submit_urb+0x3c9/0xad0 [usbcore]
> > Sep 17 15:55:52 learner kernel:  [<ffffffff815a0e60>] ? _raw_spin_lock_irqsave+0x20/0x47
> > Sep 17 15:55:52 learner kernel:  [<ffffffff810d5c8b>] ? lock_timer_base.isra.24+0x7b/0xa0
> > Sep 17 15:55:52 learner kernel:  [<ffffffff810d5d59>] ? try_to_del_timer_sync+0x49/0x60
> > Sep 17 15:55:52 learner kernel:  [<ffffffffa017180d>] ? usb_start_wait_urb+0x5d/0x140 [usbcore]
> > Sep 17 15:55:52 learner kernel:  [<ffffffffa00ee2be>] ? rtsx_usb_send_cmd+0x5e/0x80 [rtsx_usb]
> > Sep 17 15:55:52 learner kernel:  [<ffffffffa00ee4a7>] ? rtsx_usb_read_register+0x67/0xb0 [rtsx_usb]
> > Sep 17 15:55:52 learner kernel:  [<ffffffffa0b15ac1>] ? rtsx_usb_detect_ms_card+0x61/0xe0 [rtsx_usb_ms]
> > Sep 17 15:55:52 learner kernel:  [<ffffffffa0b15a60>] ? rtsx_usb_ms_set_param+0x770/0x770 [rtsx_usb_ms]
> > Sep 17 15:55:52 learner kernel:  [<ffffffff8108ee0d>] ? kthread+0xbd/0xe0
> > Sep 17 15:55:52 learner kernel:  [<ffffffff81024741>] ? __switch_to+0x2b1/0x6a0
> > Sep 17 15:55:52 learner kernel:  [<ffffffff815a118f>] ? ret_from_fork+0x1f/0x40
> > Sep 17 15:55:52 learner kernel:  [<ffffffff8108ed50>] ? kthread_create_on_node+0x180/0x180
> >
> > This is the rtsx_usb_detect_ms_card() routine in
> > drivers/memstick/host/rtsx_usb_ms.c, which runs as a kthread.  It
> > doesn't do any runtime PM.  So it looks like the bug is present in both
> > the MMC and MemoryStick interfaces.
> 
> I think the problem is even worse in the MemoryStick case, as the
> memstick core doesn't help with runtime PM. I am pretty sure there are
> other cases when the MemoryStick driver accesses the usb device
> without first runtime resuming it.

Maybe we should get a MemoryStick maintainer involved in this thread.  
I CC'ed Alex Dubov.

Alex, the problem here is that drivers/memstick/host/rtsx_usb_ms.c
tries to communicate with the host USB device while it is runtime
suspended.

> Of course we could start simple an fix the bug observed above and see
> if that solves the reported problem. Alan, do you want to post to
> patch or you want me?

This ought to help.  Ritesh, please apply this patch on top of the 
two earlier ones and let's see what happens.

Alan Stern




--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Ritesh Raj Sarraf Sept. 20, 2016, 12:36 p.m. UTC | #1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Mon, 2016-09-19 at 13:48 -0400, Alan Stern wrote:
> 
> This ought to help.  Ritesh, please apply this patch on top of the 
> two earlier ones and let's see what happens.
> 
> Alan Stern
> 
> 

Please find the logs at the following links. On this boot, I did not see any
kernel stack being printed.

https://people.debian.org/~rrs/tmp/4.8.0-rc7ulf1alan2+.kern.log
https://people.debian.org/~rrs/tmp/usb-4.8.0-rc7ulf1alan2+.log


> 
> Index: usb-4.x/drivers/memstick/host/rtsx_usb_ms.c
> ===================================================================
> --- usb-4.x.orig/drivers/memstick/host/rtsx_usb_ms.c
> +++ usb-4.x/drivers/memstick/host/rtsx_usb_ms.c
> @@ -681,6 +681,7 @@ static int rtsx_usb_detect_ms_card(void
>         int err;
>  
>         for (;;) {
> +               pm_runtime_get_sync(ms_dev(host));
>                 mutex_lock(&ucr->dev_mutex);
>  
>                 /* Check pending MS card changes */
> @@ -703,6 +704,7 @@ static int rtsx_usb_detect_ms_card(void
>                 }
>  
>  poll_again:
> +               pm_runtime_put(ms_dev(host));
>                 if (host->eject)
>                         break;
>  
- -- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX4S1BAAoJEKY6WKPy4XVprAUP/jPnuxUAZ+6qKiCVx6qB69d+
wHQDkFOxmlTwTh5GyMa+oxEqvi0shvOKZ/ef7Oz0NA9DSiLonFw4aqSzF7jBRbee
UTKIgnNxHmJC6pdMPXWo5HVLBn6qYtX4pJFX6g1MwmjEDa9pjYWK9p7QzHkrx1GB
Z3X7TcWYk3DJS04GbFO9pMDl0P1phLR2VtnfzQwqtgF/g2fy7USpft1bYIQLQzxb
oOSAEDnTCtpurdAfLWq8OVQbL3rrf+HD3InVtdCZa+lwNSNwNfUZWnKKkS1S1tq+
hgKxvGOTEGunhm6Px6iQUCE9yxsvfmDK2GBxc/a3Tqpcy5ndZv/5laKFhXTt27pa
OuGksYgHCf2vWGHFuYHJH6cQKxgdsnnE7yGwbC8zYnrCT9O3hcLPxbVbzJorWFU0
YMNKt7RYZXrNQss9J4ufkTSLvzbUqsiYJwWH27LbQ5zHC7b9/ebgnMW6JIb1x+2p
iuz6MERvyxVxorG3R260GWSz/5SM/VVnTqzlRUnMHcVAyUHNGPGoqLu5LkrmI2VT
Zwcikip9G3fE79786eKF50X7dp2kU2p+W2bBmcJEWpWV9Vz5PiQdibsiu3CQilKc
QGxrKLp0OSsUvtwb4ceD/RWu7F99F7VCu3f/ohYYS2iciux5sFky+27GfY0fEJ2u
ikPpuK6xNWWSDgaNVVHD
=Nq4a
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Sept. 20, 2016, 2:16 p.m. UTC | #2
On Tue, 20 Sep 2016, Ritesh Raj Sarraf wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> On Mon, 2016-09-19 at 13:48 -0400, Alan Stern wrote:
> > 
> > This ought to help.  Ritesh, please apply this patch on top of the 
> > two earlier ones and let's see what happens.
> > 
> > Alan Stern
> > 
> > 
> 
> Please find the logs at the following links. On this boot, I did not see any
> kernel stack being printed.
> 
> https://people.debian.org/~rrs/tmp/4.8.0-rc7ulf1alan2+.kern.log
> https://people.debian.org/~rrs/tmp/usb-4.8.0-rc7ulf1alan2+.log

This is a lot better.  No more I/O errors.

We still have irregular suspends and resumes, but that's to be 
expected.  More worrying are the spontaneous disconnects.  They don't 
seem to be related to the suspend/resume activity.

You can disable suspend for this device entirely by doing:

	echo on >/sys/bus/usb/devices/2-4/power/control

I'm afraid that this won't prevent the device from disconnecting
itself, though.  This appears to be some sort of hardware bug that
can't be fixed in software.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ritesh Raj Sarraf Sept. 20, 2016, 3:17 p.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Alan,

On Tue, 2016-09-20 at 10:16 -0400, Alan Stern wrote:
> This is a lot better.  No more I/O errors.
> 
> We still have irregular suspends and resumes, but that's to be 
> expected.  More worrying are the spontaneous disconnects.  They don't 
> seem to be related to the suspend/resume activity.
> 
> You can disable suspend for this device entirely by doing:
> 
>         echo on >/sys/bus/usb/devices/2-4/power/control
> 

Yes. But that'd also mean to write that value upon every suspend/resume cycle
because the rtsx usb driver still declares support for autosuspend.
Should that be dropped ?

> I'm afraid that this won't prevent the device from disconnecting
> itself, though.  This appears to be some sort of hardware bug that
> can't be fixed in software.

And that'd mean that upon every reset, the driver will again enable autosuspend
for that driver.


It is an upsetting state for this device but thank you, to all of you, for
helping debug this problem.

- -- 
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX4VLyAAoJEKY6WKPy4XVprEgQAKdYcFIF1ICNGHkF+oyDe1VY
DqyxTa0vLPbXWsG6GaV4Jdwld+gVKiWIVNKxbLpboBHj/vj0bkzoPJ0z4xI+Oc8p
bu6T5Io6nAaegdGa6XvYIAXk1fIlXEehBFVM6OyzC1EUJAjgYhIDVlG8mqZKxqVp
GGsX1e5UFdS0vCqjYqSxI5IHrqsm1M4lXuwr8ia66qyuSfpg8trizLiWrdiEa7hv
hRtI81XxITvVJ4+2ernO6Y+RO/z6WQLs1SAhXvDEH3RlYh/RoEBpolqMIO8LVWMK
jd4GSsYmMKiG1eJJq3UYM+iPDANIIi4gdO0hf/24vNcsVa8eF5kcKT+bxufRaiB/
5TzZS0RARBdq1+N6bK/wF8lDL4bWy4Sl1mts/dXJaCOlOoLeQ/u/J55K58mDJb2O
gdvEvzD/S9NNeawL2ow4sxaM8EBpeyJtBTyVIbLJVFmetauVs+ClI0uLoNMW/N+u
qMDE8yhiey4ClXa0WmVZHN4qjfNAnW4OSMtrq8+TLa1yhVj/ONNBo8QkfjuKBHwE
ELDrX3/N9JsZo6ZX0CPNVhodvck8ZVKrk8w3jAlcqQ0FlJy5MMFoqGEvqJ1EIfNv
IQ5FKrI08RIA7yw6keTy3nybzyY3MhepLJWxiEVyKRCopiHk96DmEGJmOmor7VIQ
hjgbpCikJ7sXPcn0n97u
=DBja
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Sept. 20, 2016, 3:43 p.m. UTC | #4
On Tue, 20 Sep 2016, Ritesh Raj Sarraf wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Hello Alan,
> 
> On Tue, 2016-09-20 at 10:16 -0400, Alan Stern wrote:
> > This is a lot better.  No more I/O errors.
> > 
> > We still have irregular suspends and resumes, but that's to be 
> > expected.  More worrying are the spontaneous disconnects.  They don't 
> > seem to be related to the suspend/resume activity.
> > 
> > You can disable suspend for this device entirely by doing:
> > 
> >         echo on >/sys/bus/usb/devices/2-4/power/control
> > 
> 
> Yes. But that'd also mean to write that value upon every suspend/resume cycle
> because the rtsx usb driver still declares support for autosuspend.
> Should that be dropped ?

No, the value doesn't change across a suspend/resume cycle.

> > I'm afraid that this won't prevent the device from disconnecting
> > itself, though.  This appears to be some sort of hardware bug that
> > can't be fixed in software.
> 
> And that'd mean that upon every reset, the driver will again enable autosuspend
> for that driver.

Yes, that's true.  I'm curious to see if preventing autosuspends will 
get rid of the resets.  My guess is that it won't.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ritesh Raj Sarraf Sept. 20, 2016, 3:51 p.m. UTC | #5
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On Tue, 2016-09-20 at 11:43 -0400, Alan Stern wrote:
> > Yes. But that'd also mean to write that value upon every suspend/resume
> cycle
> > because the rtsx usb driver still declares support for autosuspend.
> > Should that be dropped ?
> 
> No, the value doesn't change across a suspend/resume cycle.
> 

I just verified, and yes, you are right. The value doesn't change.

> > > I'm afraid that this won't prevent the device from disconnecting
> > > itself, though.  This appears to be some sort of hardware bug that
> > > can't be fixed in software.
> > 
> > And that'd mean that upon every reset, the driver will again enable
> autosuspend
> > for that driver.
> 
> Yes, that's true.  I'm curious to see if preventing autosuspends will 
> get rid of the resets.  My guess is that it won't.

No. We tried it in the beginning. And the resets were still seen.

Thanks.

- -- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX4Vr/AAoJEKY6WKPy4XVpqSoP/jimRbblWfR56HM3RuK6jLTp
XHm2lJj6LopxC7BDPVF+SIMULlTyPh2hjbF2MVw9yNwsv1nruGQspzZ5VrBWlTsK
8Fs2sJif7Y8tWVFEMZczghrEoHN0KLe5vW4W6rX8xjjH5nL6ljtUeBDT6DyvD7yT
WymQWfObwp6VnjoR3nZ1SzB4DN/oGH10NaMjkk234mTkhU9Pl+UXFmesDdWn8Y64
3l5SpemMbNQaCaa/jyFQBJXu3+OTYVQafHjcl0bb3aRt4sHq5neS5zc/EIjz+Cpo
kqQwpQ6FslvSvamlwwB8mqDalPQZHeIvUNFMjlldpiAs8iCVeMHpolWI/CXCfo+1
BwVv8Kc1VnoMsjZ7uEUQJY9F1Q7YJ+4gFK6WSAhz7B9Na/0ztPJgq0tFYnVQgrwx
zUnLL7jPZZ4Wt8if9UayPtCUCdqHBSIfeoJ7+HMkC6FPt5GGCsrhtZX0u0Onop7F
Ka/VNgpMUNccgPvdqq3zYKyNIaAIUPf0jSyFbwxVXGbCLSZi8f4QmSw7k3BvkqNN
lR+pyqjKbImTpzqk0QT22SGT+4MeQgclbEUkpfA8PaPyb+9uLjgtZgp1ucTlMzOV
c3mXaTzRtSihagSW4hNyqYOINtBnvZp3n2fWDPgjJu+LWGOhpqY7P8mq8gFj77Fp
G49mKNuDiOkPWH3qHJgA
=bGWs
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ritesh Raj Sarraf Sept. 21, 2016, 11:10 a.m. UTC | #6
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hi Alan,

On Tue, 2016-09-20 at 10:16 -0400, Alan Stern wrote:
> This is a lot better.  No more I/O errors.
> 
> We still have irregular suspends and resumes, but that's to be 
> expected.  More worrying are the spontaneous disconnects.  They don't 
> seem to be related to the suspend/resume activity.
> 
> You can disable suspend for this device entirely by doing:
> 
>         echo on >/sys/bus/usb/devices/2-4/power/control
> 
> I'm afraid that this won't prevent the device from disconnecting
> itself, though.  This appears to be some sort of hardware bug that
> can't be fixed in software.

I'm not sure what you were referring to when you said "No more I/O errors".
But I still got these errors today, with all patches applied.

Sep 21 14:58:11 learner kernel: usb 2-4: new high-speed USB device number 98
using xhci_hcd
Sep 21 14:58:18 learner kernel: usb 2-4: new high-speed USB device number 102
using xhci_hcd
Sep 21 14:58:24 learner kernel: usb 2-4: new high-speed USB device number 106
using xhci_hcd
Sep 21 14:58:31 learner kernel: usb 2-4: new high-speed USB device number 114
using xhci_hcd
Sep 21 14:58:41 learner kernel: usb 2-4: new high-speed USB device number 12
using xhci_hcd
Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 14:58:41 learner kernel: usb 2-4: new high-speed USB device number 13
using xhci_hcd
Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 14:58:42 learner kernel: usb 2-4: new high-speed USB device number 14
using xhci_hcd
Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
Sep 21 14:58:42 learner kernel: usb 2-4: device not accepting address 14, error
- -71
Sep 21 14:58:42 learner kernel: usb 2-4: new high-speed USB device number 15
using xhci_hcd
Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
Sep 21 14:58:43 learner kernel: usb 2-4: device not accepting address 15, error
- -71
Sep 21 14:58:43 learner kernel: usb usb2-port4: unable to enumerate USB device
Sep 21 16:19:39 learner kernel: ahci 0000:00:1f.2: port does not support device
sleep
Sep 21 16:19:39 learner kernel: NMI watchdog: enabled on all CPUs, permanently
consumes one hw-
Sep 21 16:19:39 learner kernel: EXT4-fs (dm-0): re-mounted. Opts:
errors=remount-ro,data=ordere
Sep 21 16:19:39 learner kernel: EXT4-fs (sda6): re-mounted. Opts:
data=ordered,commit=0
Sep 21 16:19:39 learner kernel: EXT4-fs (dm-3): re-mounted. Opts:
errors=remount-ro,data=writeb
Sep 21 16:19:39 learner kernel: usb 2-4: new high-speed USB device number 16
using xhci_hcd
Sep 21 16:19:39 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 16:19:40 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 16:19:40 learner kernel: usb 2-4: new high-speed USB device number 17
using xhci_hcd
Sep 21 16:19:40 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 16:19:40 learner kernel: usb 2-4: device descriptor read/64, error -71
Sep 21 16:19:40 learner kernel: usb 2-4: new high-speed USB device number 18
using xhci_hcd
Sep 21 16:19:40 learner kernel: usb 2-4: Device not responding to setup address.
Sep 21 16:19:41 learner kernel: usb 2-4: Device not responding to setup address.
Sep 21 16:19:41 learner kernel: usb 2-4: device not accepting address 18, error
- -71
Sep 21 16:19:41 learner kernel: usb 2-4: new high-speed USB device number 19
using xhci_hcd


- -- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX4mqUAAoJEKY6WKPy4XVpEHMP/RhcDQXxt3LTOpGhJizyqZ4z
7Sm1tcBe/4NKP80nUpiI0geQYHYfRTR93hGKFayp48ULstn8xJ8T3ItZIS0WmZDK
TJcdxXzrkWMGNAGQFjNd9Lk1C7h1IIuo2D5xDuhrpHGMc5y4UVmpPixQRwEnbzG9
zX+PabvummAmlzT1+cRyO10uwpGFzsJ3SDkokjkxZ/aViL+vBU58/qiXIFH1D1hX
KTY8ABZjh4Hnkw07EcQh0xKztEbE/v2wJWPSx4RCPbsRdO5vdKUtOtWB7+1WVAY3
noSrvNWjj0Ntnm0+t4XIid1fDmNumK0EcYe8fDb/GqAuYDTqjcIZ5ANCaVSM/joq
suY7KTXVe44Pol1Bb89lERR49QAkxyKJViNc0bNSkp0+F4u4cDW9o0q6s0X6xw5b
LdAQHQek92IRNmT7v4gYO9bUKUBurqgHuUdi3iYlylbvs8UAzHmOL3nrFBz2GIcG
KQvqmvENy31VIlIMx+k3SipyedG77LIAmxX8bG7Xlu8lSZz3sPkMz7RJYeW0QwQ6
lC2cWiF2cn5K/0eTQPW3MX5H9m5qlq0QGaDrf8kGX6XpRKR3Qsu98L+R+AAmViQ9
kd2eBFzL4JdNVhXgNWrNk5mr0R0D9RB/58YWize3sASPg75zCFQCNOoFTPNZGN1q
edPs8uwkN5O2cy+0ur8n
=uZvc
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Sept. 21, 2016, 11:17 a.m. UTC | #7
On 21 September 2016 at 13:10, Ritesh Raj Sarraf <rrs@researchut.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> Hi Alan,
>
> On Tue, 2016-09-20 at 10:16 -0400, Alan Stern wrote:
>> This is a lot better.  No more I/O errors.
>>
>> We still have irregular suspends and resumes, but that's to be
>> expected.  More worrying are the spontaneous disconnects.  They don't
>> seem to be related to the suspend/resume activity.
>>
>> You can disable suspend for this device entirely by doing:
>>
>>         echo on >/sys/bus/usb/devices/2-4/power/control
>>
>> I'm afraid that this won't prevent the device from disconnecting
>> itself, though.  This appears to be some sort of hardware bug that
>> can't be fixed in software.
>
> I'm not sure what you were referring to when you said "No more I/O errors".
> But I still got these errors today, with all patches applied.
>
> Sep 21 14:58:11 learner kernel: usb 2-4: new high-speed USB device number 98
> using xhci_hcd
> Sep 21 14:58:18 learner kernel: usb 2-4: new high-speed USB device number 102
> using xhci_hcd
> Sep 21 14:58:24 learner kernel: usb 2-4: new high-speed USB device number 106
> using xhci_hcd
> Sep 21 14:58:31 learner kernel: usb 2-4: new high-speed USB device number 114
> using xhci_hcd
> Sep 21 14:58:41 learner kernel: usb 2-4: new high-speed USB device number 12
> using xhci_hcd
> Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 14:58:41 learner kernel: usb 2-4: new high-speed USB device number 13
> using xhci_hcd
> Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 14:58:42 learner kernel: usb 2-4: new high-speed USB device number 14
> using xhci_hcd
> Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
> Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
> Sep 21 14:58:42 learner kernel: usb 2-4: device not accepting address 14, error
> - -71
> Sep 21 14:58:42 learner kernel: usb 2-4: new high-speed USB device number 15
> using xhci_hcd
> Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
> Sep 21 14:58:42 learner kernel: usb 2-4: Device not responding to setup address.
> Sep 21 14:58:43 learner kernel: usb 2-4: device not accepting address 15, error
> - -71
> Sep 21 14:58:43 learner kernel: usb usb2-port4: unable to enumerate USB device
> Sep 21 16:19:39 learner kernel: ahci 0000:00:1f.2: port does not support device
> sleep
> Sep 21 16:19:39 learner kernel: NMI watchdog: enabled on all CPUs, permanently
> consumes one hw-
> Sep 21 16:19:39 learner kernel: EXT4-fs (dm-0): re-mounted. Opts:
> errors=remount-ro,data=ordere
> Sep 21 16:19:39 learner kernel: EXT4-fs (sda6): re-mounted. Opts:
> data=ordered,commit=0
> Sep 21 16:19:39 learner kernel: EXT4-fs (dm-3): re-mounted. Opts:
> errors=remount-ro,data=writeb
> Sep 21 16:19:39 learner kernel: usb 2-4: new high-speed USB device number 16
> using xhci_hcd
> Sep 21 16:19:39 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 16:19:40 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 16:19:40 learner kernel: usb 2-4: new high-speed USB device number 17
> using xhci_hcd
> Sep 21 16:19:40 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 16:19:40 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 16:19:40 learner kernel: usb 2-4: new high-speed USB device number 18
> using xhci_hcd
> Sep 21 16:19:40 learner kernel: usb 2-4: Device not responding to setup address.
> Sep 21 16:19:41 learner kernel: usb 2-4: Device not responding to setup address.
> Sep 21 16:19:41 learner kernel: usb 2-4: device not accepting address 18, error
> - -71
> Sep 21 16:19:41 learner kernel: usb 2-4: new high-speed USB device number 19
> using xhci_hcd
>

I am pretty sure the memstick driver causes additional access to the
usb device without first calling pm_runtime_get_sync(). To eliminate
those cases from causing the issues, could you try disable the
memstick driver all-together?

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ritesh Raj Sarraf Sept. 21, 2016, 11:42 a.m. UTC | #8
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Ulf,

On Wed, 2016-09-21 at 13:17 +0200, Ulf Hansson wrote:
> 
> I am pretty sure the memstick driver causes additional access to the
> usb device without first calling pm_runtime_get_sync(). To eliminate
> those cases from causing the issues, could you try disable the
> memstick driver all-together?

I'm assuming you are referring to the rtsx_usb_ms driver ?

What is the oddest thing right now, is that none of the rtsx modules are
reported loaded.

rrs@learner:~$ lsmod | grep -i rts
2016-09-21 / 17:07:08 ♒♒♒  ☹  => 1  

where as the module was built for the kernel, and does load when asked manually.

rrs@learner:~$ less /boot/config-4.8.0-rc7alxb+ 
2016-09-21 / 17:07:54 ♒♒♒  ☺  
rrs@learner:~$ find /lib/modules/4.8.0-rc7alxb+/ | grep rtsx
/lib/modules/4.8.0-rc7alxb+/kernel/drivers/mmc/host/rtsx_usb_sdmmc.ko
/lib/modules/4.8.0-rc7alxb+/kernel/drivers/mmc/host/rtsx_pci_sdmmc.ko
/lib/modules/4.8.0-rc7alxb+/kernel/drivers/mfd/rtsx_usb.ko
/lib/modules/4.8.0-rc7alxb+/kernel/drivers/mfd/rtsx_pci.ko
/lib/modules/4.8.0-rc7alxb+/kernel/drivers/memstick/host/rtsx_pci_ms.ko
/lib/modules/4.8.0-rc7alxb+/kernel/drivers/memstick/host/rtsx_usb_ms.ko
2016-09-21 / 17:08:09 ♒♒♒  ☺  

rrs@learner:~$ sudo modprobe rtsx-usb-sdmmc
2016-09-21 / 17:08:46 ♒♒♒  ☺  

rrs@learner:~$ dmesg | tail -n 5
[ 6870.017311] usb 2-4: Device not responding to setup address.
[ 6870.223471] usb 2-4: device not accepting address 19, error -71
[ 6870.223536] usb usb2-port4: unable to enumerate USB device
[ 7958.474543] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure
on pipe A (start=166356 end=166357) time 3 us, min 1073, max 1079, scanline
start 1088, end 1080
[ 9814.785241] usbcore: registered new interface driver rtsx_usb
2016-09-21 / 17:10:07 ♒♒♒  ☺  



- -- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX4nIMAAoJEKY6WKPy4XVpg4gP/Rkp5Wje2vTUHwnrJZzAKd65
V1VWkTyCfpbCTw4nAyMbzDevyoXJf3P+ktrcyQonvoZr/dsd/LOmjSXcjNJaFX0z
Vj+1lGOZeN1mr6vKMbh2y188oU4RyHkeCfg7SNdo1VuhVST/m2jOecCznXuEtpiS
TI66lmre0SNRusKHRQNDtaP4hFW0KDBmtXvc8pnNOL5781qua0pF12VIP5SsqriP
3d/DcqlVR0Lqh7X7wdt5Knp9ilSvsrCgGbimBarRUYnrOrhklJH9UwKXcHWVypzt
M0XvjO7R6JrBUoM6s8EA6gKmNxwlsIrjKFUFlTT6HPLb52fjpYkbYqCRUmqyIJZB
t3uA0hNKcqLav+Fg7ugT6ePAPeVkANDnbrPE+g69KOtM/CEJHbHkLxqznb/0lpMU
+SAp/Jz1CmDLt8M3s8gS9iCUWVrWy1oyDpMQsYIrTYOG6oOEOoTYEf/g5D6PBtY0
r1tD5bU/cZJV61YKer2xRDNu1YbAYkvX3XskFD7DFsnZpCyBXKGZ7gWpWety9kJJ
iBiqvn5Rk7jvL6EhIb/TQ867QLmhQCzZumPClFM4z7b8G2E440vykw5D5sKv91+H
qu9Abcxe0R0T9pxCFRz+//DYlxGvDDlUSyNBkrv6aGS1rNeDY/e4FWbNuEz9UoGn
LNiunOtwzBndajfEFETk
=tEAz
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Sept. 21, 2016, 2:37 p.m. UTC | #9
On Wed, 21 Sep 2016, Ritesh Raj Sarraf wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
> 
> Hi Alan,
> 
> On Tue, 2016-09-20 at 10:16 -0400, Alan Stern wrote:
> > This is a lot better.  No more I/O errors.
> > 
> > We still have irregular suspends and resumes, but that's to be 
> > expected.  More worrying are the spontaneous disconnects.  They don't 
> > seem to be related to the suspend/resume activity.
> > 
> > You can disable suspend for this device entirely by doing:
> > 
> >         echo on >/sys/bus/usb/devices/2-4/power/control
> > 
> > I'm afraid that this won't prevent the device from disconnecting
> > itself, though.  This appears to be some sort of hardware bug that
> > can't be fixed in software.
> 
> I'm not sure what you were referring to when you said "No more I/O errors".

I was referring to the attempts at I/O while the device was suspended.  
They didn't occur in your most recent test.

> But I still got these errors today, with all patches applied.
> 
> Sep 21 14:58:11 learner kernel: usb 2-4: new high-speed USB device number 98
> using xhci_hcd
> Sep 21 14:58:18 learner kernel: usb 2-4: new high-speed USB device number 102
> using xhci_hcd
> Sep 21 14:58:24 learner kernel: usb 2-4: new high-speed USB device number 106
> using xhci_hcd
> Sep 21 14:58:31 learner kernel: usb 2-4: new high-speed USB device number 114
> using xhci_hcd
> Sep 21 14:58:41 learner kernel: usb 2-4: new high-speed USB device number 12
> using xhci_hcd
> Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 14:58:41 learner kernel: usb 2-4: device descriptor read/64, error -71
> Sep 21 14:58:41 learner kernel: usb 2-4: new high-speed USB device number 13
> using xhci_hcd

These are a completely different kind of error.  They occurred during a 
reset, which followed one of those spontaneous disconnects.  Probably 
the cause of the disconnect is also the cause of these errors.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: usb-4.x/drivers/memstick/host/rtsx_usb_ms.c
===================================================================
--- usb-4.x.orig/drivers/memstick/host/rtsx_usb_ms.c
+++ usb-4.x/drivers/memstick/host/rtsx_usb_ms.c
@@ -681,6 +681,7 @@  static int rtsx_usb_detect_ms_card(void
 	int err;
 
 	for (;;) {
+		pm_runtime_get_sync(ms_dev(host));
 		mutex_lock(&ucr->dev_mutex);
 
 		/* Check pending MS card changes */
@@ -703,6 +704,7 @@  static int rtsx_usb_detect_ms_card(void
 		}
 
 poll_again:
+		pm_runtime_put(ms_dev(host));
 		if (host->eject)
 			break;