diff mbox

xHCI problem? [was Re: Erratic USB device behavior and device loss]

Message ID CAPDyKFo5E=MXGi7Zk84mfbrjKVc-68OsKv6jBCGKgvybj9jMfQ@mail.gmail.com
State New
Headers show

Commit Message

Ulf Hansson Sept. 15, 2016, 1:59 p.m. UTC
On 14 September 2016 at 17:19, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Wed, 14 Sep 2016, Ritesh Raj Sarraf wrote:
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA512
>>
>> Hello Ulf and Alan,
>>
>> On Fri, 2016-09-09 at 19:34 +0530, Ritesh Raj Sarraf wrote:
>> > For #2, I'm building the 4.8-rc5 kernel with the following change. This build
>> > does not include the previous change you had suggested (related to
>> > POWER_CYCLE)
>> >
>> > Date:   Fri Sep 9 19:28:03 2016 +0530
>> >
>> >     Disable pm runtime in mmc core
>> >
>> > diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
>> > index e55cde6..32388d5 100644
>> > --- a/drivers/mmc/core/core.c
>> > +++ b/drivers/mmc/core/core.c
>> > @@ -970,9 +970,6 @@ int __mmc_claim_host(struct mmc_host *host, atomic_t
>> > *abort)
>> >         spin_unlock_irqrestore(&host->lock, flags);
>> >         remove_wait_queue(&host->wq, &wait);
>> >
>> > -       if (pm)
>> > -               pm_runtime_get_sync(mmc_dev(host));
>> > -
>> >         return stop;
>> >  }
>> >  EXPORT_SYMBOL(__mmc_claim_host);
>> > @@ -1000,7 +997,6 @@ void mmc_release_host(struct mmc_host *host)
>> >                 spin_unlock_irqrestore(&host->lock, flags);
>> >                 wake_up(&host->wq);
>> >                 pm_runtime_mark_last_busy(mmc_dev(host));
>> > -               pm_runtime_put_autosuspend(mmc_dev(host));
>> >         }
>> >  }
>> >  EXPORT_SYMBOL(mmc_release_host);
>>
>> I tried with these changes on 4.8-rc6 and I only saw 2 resets so far.
>> I captured the usb trace [1], just in case if you need it.
>>
>> [1] https://people.debian.org/~rrs/tmp/4.8-rc6-ulf.txt.gz
>
> The situation isn't any better.  At the start of the trace,
> the device is in runtime suspend but there are many attempts to
> communicate with it, all of which fail.

It's really weird. Have this driver ever worked!? :-)

>
> Then a little less than an hour after the trace started, the device was
> resumed.  At that point it started working okay.  Until there was a
> spontaneous disconnect.
>
> The device reconnected, but after 3 seconds it was runtime suspended
> again -- and the I/O attempts continued.  Some time later there was
> another runtime resume, and the device began working again.  Until
> another spontaneous disconnect occurred.  And so on...
>
> Ulf, we really need to figure out why the autosuspends are occurring
> and why the I/O doesn't stop while the device is suspended.

Okay, let's see.

I had another look in the rtsx_usb_sdmmc driver. Apparently it
registers a led classdev. Updating the led is done from a work, by
calling rtsx_usb_turn_on|off_led(), which do access the usb device.
These calls are not properly managed by runtime PM, so I have fixed
those according to below change:

---
 drivers/mmc/host/rtsx_usb_sdmmc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Alan Stern Sept. 15, 2016, 2:16 p.m. UTC | #1
On Thu, 15 Sep 2016, Ulf Hansson wrote:

> > The situation isn't any better.  At the start of the trace,
> > the device is in runtime suspend but there are many attempts to
> > communicate with it, all of which fail.
> 
> It's really weird. Have this driver ever worked!? :-)

Probably not.  Or at least, not with runtime PM.

> > Then a little less than an hour after the trace started, the device was
> > resumed.  At that point it started working okay.  Until there was a
> > spontaneous disconnect.
> >
> > The device reconnected, but after 3 seconds it was runtime suspended
> > again -- and the I/O attempts continued.  Some time later there was
> > another runtime resume, and the device began working again.  Until
> > another spontaneous disconnect occurred.  And so on...
> >
> > Ulf, we really need to figure out why the autosuspends are occurring
> > and why the I/O doesn't stop while the device is suspended.
> 
> Okay, let's see.
> 
> I had another look in the rtsx_usb_sdmmc driver. Apparently it
> registers a led classdev. Updating the led is done from a work, by
> calling rtsx_usb_turn_on|off_led(), which do access the usb device.
> These calls are not properly managed by runtime PM, so I have fixed
> those according to below change:
> 
> ---
>  drivers/mmc/host/rtsx_usb_sdmmc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/mmc/host/rtsx_usb_sdmmc.c
> b/drivers/mmc/host/rtsx_usb_sdmmc.c
> index 6c71fc9..a59c7fa 100644
> --- a/drivers/mmc/host/rtsx_usb_sdmmc.c
> +++ b/drivers/mmc/host/rtsx_usb_sdmmc.c
> @@ -1314,6 +1314,7 @@ static void rtsx_usb_update_led(struct work_struct *work)
>                 container_of(work, struct rtsx_usb_sdmmc, led_work);
>         struct rtsx_ucr *ucr = host->ucr;
> 
> +       pm_runtime_get_sync(sdmmc_dev(host));
>         mutex_lock(&ucr->dev_mutex);
> 
>         if (host->led.brightness == LED_OFF)
> @@ -1322,6 +1323,7 @@ static void rtsx_usb_update_led(struct work_struct *work)
>                 rtsx_usb_turn_on_led(ucr);
> 
>         mutex_unlock(&ucr->dev_mutex);
> +       pm_runtime_put(sdmmc_dev(host));
>  }
>  #endif
> 
> -- 
> 
> Although, I doubt the above is the main reason to the issues we see.

I don't know -- it could well be the reason.  The symptoms are 
definitely what you would expect to see if some thread was doing I/O 
without calling the pm_runtime_* routines.

> Instead I think somehow the parent device (usb device) isn't being
> properly managed through runtime PM, but not due to wrong deployment
> in the mmc core nor in the rtsx_usb_driver, but at some place else.
> :-)
> 
> I started looking for calls to pm_suspend_ignore_children(dev, true),
> which would decouple the usb device from the mmc platform device from
> a runtime PM point of view. I found one suspicious case!
> 
> drivers/usb/storage/realtek_cr.c:
> pm_suspend_ignore_children(&us->pusb_intf->dev, true);
> 
> As I am not so familiar with USB, I can't really tell why the above
> exists, although perhaps just removing that line would be worth a
> try!?

No, the realtek_cr driver has no connection with this.  It's a
sub-module of the usb_storage driver; it uses the SCSI interface,
not the MMC interface.

> If neither of the above works, the next step could be to start
> checking error codes in the mmc core and in the rtsx_usb_sdmmc driver,
> from the calls to pm_runtime_get|put() and pm_runtime_enable().

Let's see what this patch does.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ritesh Raj Sarraf Sept. 16, 2016, 3:42 p.m. UTC | #2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

Hello Ulf and Alan,

On Thu, 2016-09-15 at 10:16 -0400, Alan Stern wrote:
> > ---
> >  drivers/mmc/host/rtsx_usb_sdmmc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/mmc/host/rtsx_usb_sdmmc.c
> > b/drivers/mmc/host/rtsx_usb_sdmmc.c
> > index 6c71fc9..a59c7fa 100644
> > --- a/drivers/mmc/host/rtsx_usb_sdmmc.c
> > +++ b/drivers/mmc/host/rtsx_usb_sdmmc.c
> > @@ -1314,6 +1314,7 @@ static void rtsx_usb_update_led(struct work_struct
> *work)
> >                 container_of(work, struct rtsx_usb_sdmmc, led_work);
> >         struct rtsx_ucr *ucr = host->ucr;
> > 
> > +       pm_runtime_get_sync(sdmmc_dev(host));
> >         mutex_lock(&ucr->dev_mutex);
> > 
> >         if (host->led.brightness == LED_OFF)
> > @@ -1322,6 +1323,7 @@ static void rtsx_usb_update_led(struct work_struct
> *work)
> >                 rtsx_usb_turn_on_led(ucr);
> > 
> >         mutex_unlock(&ucr->dev_mutex);
> > +       pm_runtime_put(sdmmc_dev(host));
> >  }
> >  #endif
> > 
> > -- 
> > 
> > Although, I doubt the above is the main reason to the issues we see.
> 
> I don't know -- it could well be the reason.  The symptoms are 
> definitely what you would expect to see if some thread was doing I/O 
> without calling the pm_runtime_* routines.
> 
> > Instead I think somehow the parent device (usb device) isn't being
> > properly managed through runtime PM, but not due to wrong deployment
> > in the mmc core nor in the rtsx_usb_driver, but at some place else.
> > :-)
> > 
> > I started looking for calls to pm_suspend_ignore_children(dev, true),
> > which would decouple the usb device from the mmc platform device from
> > a runtime PM point of view. I found one suspicious case!
> > 
> > drivers/usb/storage/realtek_cr.c:
> > pm_suspend_ignore_children(&us->pusb_intf->dev, true);
> > 
> > As I am not so familiar with USB, I can't really tell why the above
> > exists, although perhaps just removing that line would be worth a
> > try!?
> 
> No, the realtek_cr driver has no connection with this.  It's a
> sub-module of the usb_storage driver; it uses the SCSI interface,
> not the MMC interface.
> 
> > If neither of the above works, the next step could be to start
> > checking error codes in the mmc core and in the rtsx_usb_sdmmc driver,
> > from the calls to pm_runtime_get|put() and pm_runtime_enable().
> 
> Let's see what this patch does.

I was able to hit it again. Please find the usbmon trace at:
https://people.debian.org/~rrs/tmp/usb-4.8.0-rc6ulf1+.log.gz


- -- 
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."
-----BEGIN PGP SIGNATURE-----

iQIcBAEBCgAGBQJX3BLOAAoJEKY6WKPy4XVpXMgP/jTyKOX/SYCPTU9twYldY7LQ
f64hpiWqXOUs+jFYM+BcrF5B5DuXiB1Wm4F3+Xm/QBN3grJD7yBq1nrhv/mAhCr3
y1gFRIbeKfZsEp1vdBov9m1jQCZzzIZlFXPmRGT/8uC/GZTHlgIeSLqBntpq9+yL
MQSE91tLVayVgaOQxpPz+uZ4PTAom19sU21Haa90ECHLKAUTJ9WncQFecjPLHMjb
4SUvgq53V2s1Yo1E85RhtgR6Nrk/Bh7qZEC1NyeganLazGbbsz9YnRcGy58x9Jiq
xmfURTtvG834CnGcGuzcRU09FGPMtXx/u57EYC6mdEMWhSglo0h6YhVxcUOtAhRD
s1gs+a6ToKTDLn6qr0cnIwG27ALyLh41QmzxEpiaZiugIEBzZ/uK3TBjzcul4Huj
v0+x2fSC0SXwGo4P3GAOnHuWUjgj3C1wElP1R3brXfO0aayESUNKzE8V7RbQIWiC
mHewSlKTiPwCr/lchaINTt2TyFcHJWOx90iV10GO5TpMyqho4AzpBpoimItrbx2t
qQJCvGzDLPjr0tPvpeWyJSfBnqCDqbJ44CY3nCFgKhTd3BXp4fDj09eBtNmSiuvu
UdZZxm84FD3BDSNX8k2W9CF81jML/4lzwliJge3uIPrXNDqGSZMxDSpd0u1EFNHf
rEQ/kP1WlArvqButQ5ZN
=fiWV
-----END PGP SIGNATURE-----

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/rtsx_usb_sdmmc.c
b/drivers/mmc/host/rtsx_usb_sdmmc.c
index 6c71fc9..a59c7fa 100644
--- a/drivers/mmc/host/rtsx_usb_sdmmc.c
+++ b/drivers/mmc/host/rtsx_usb_sdmmc.c
@@ -1314,6 +1314,7 @@  static void rtsx_usb_update_led(struct work_struct *work)
                container_of(work, struct rtsx_usb_sdmmc, led_work);
        struct rtsx_ucr *ucr = host->ucr;

+       pm_runtime_get_sync(sdmmc_dev(host));
        mutex_lock(&ucr->dev_mutex);

        if (host->led.brightness == LED_OFF)
@@ -1322,6 +1323,7 @@  static void rtsx_usb_update_led(struct work_struct *work)
                rtsx_usb_turn_on_led(ucr);

        mutex_unlock(&ucr->dev_mutex);
+       pm_runtime_put(sdmmc_dev(host));
 }
 #endif