diff mbox

USB: core: Add warm reset while reset-resuming SuperSpeed HUBs

Message ID CAA9_cme4XmXqQfx9kTPEJWdx2XwBZk-G8s7pjSuZ5KWg3UZZ8Q@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dan Williams Dec. 11, 2013, 10:51 p.m. UTC
On Wed, Dec 11, 2013 at 11:36 AM, Sarah Sharp
<sarah.a.sharp@linux.intel.com> wrote:
> On Wed, Dec 11, 2013 at 11:00:13AM -0800, Julius Werner wrote:
>> > I don't know what you mean by "fails".  The system goes to sleep and
>> > then later on wakes up, doesn't it?
>> >
>> > Do you mean that the Jetflash device gets disconnected when the system
>> > wakes up?  That's _supposed_ to happen under those circumstances.
>> > When hub_activate() sees HUB_RESET_RESUME, all child devices get
>> > disconnected except those where udev->persist_enabled is set.
>>
>> This patch was written in response to the same bug as my "usb: hub:
>> Use correct reset for wedged USB3 devices that are NOTATTACHED"
>> submission. My patch only helps when the port gets stuck in Compliance
>> Mode, but Vikas reports that he can sometimes see it stuck in Polling
>> or Recovery states as well.
>>
>> The underlying issue is a deadlock in the USB 3.0 link training state
>> machine when the host controller is unilaterally reset on resume
>> (without driving a reset on the bus).
>
> The xHCI spec requires that when the xHCI host is reset, a USB reset is
> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
> to warm reset.  See table 32 in the xHCI spec, in the definition of
> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
> ports at all on host controller reset?

...although, the spec says that it does not wait for the port resets
to complete.  As far as I can see re-issuing a warm reset and waiting
is the only way to guarantee the core times the recovery.  Presumably
the portstatus debounce in hub_activate() mitigates this, but that
100ms is less than a full reset timeout.  I have something similar in
the port power rework patches [1], but I think something like the
following (untested) is more generic, it arranges for reset_resume to
start with a warm reset if necessary.

(also attached)

Comments

Dan Williams Dec. 11, 2013, 11:38 p.m. UTC | #1
On Wed, Dec 11, 2013 at 2:51 PM, Dan Williams <dan.j.williams@gmail.com> wrote:
> On Wed, Dec 11, 2013 at 11:36 AM, Sarah Sharp
> <sarah.a.sharp@linux.intel.com> wrote:
>> On Wed, Dec 11, 2013 at 11:00:13AM -0800, Julius Werner wrote:
>>> > I don't know what you mean by "fails".  The system goes to sleep and
>>> > then later on wakes up, doesn't it?
>>> >
>>> > Do you mean that the Jetflash device gets disconnected when the system
>>> > wakes up?  That's _supposed_ to happen under those circumstances.
>>> > When hub_activate() sees HUB_RESET_RESUME, all child devices get
>>> > disconnected except those where udev->persist_enabled is set.
>>>
>>> This patch was written in response to the same bug as my "usb: hub:
>>> Use correct reset for wedged USB3 devices that are NOTATTACHED"
>>> submission. My patch only helps when the port gets stuck in Compliance
>>> Mode, but Vikas reports that he can sometimes see it stuck in Polling
>>> or Recovery states as well.
>>>
>>> The underlying issue is a deadlock in the USB 3.0 link training state
>>> machine when the host controller is unilaterally reset on resume
>>> (without driving a reset on the bus).
>>
>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
>> to warm reset.  See table 32 in the xHCI spec, in the definition of
>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
>> ports at all on host controller reset?
>
> ...although, the spec says that it does not wait for the port resets
> to complete.  As far as I can see re-issuing a warm reset and waiting
> is the only way to guarantee the core times the recovery.  Presumably
> the portstatus debounce in hub_activate() mitigates this, but that
> 100ms is less than a full reset timeout.  I have something similar in
> the port power rework patches [1], but I think something like the
> following (untested) is more generic, it arranges for reset_resume to
> start with a warm reset if necessary.
>
> (also attached)
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index a7c04e24ca48..30ce237569dd 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -2783,8 +2783,14 @@ static int check_port_resume_type(struct
> usb_device *udev,
>                 struct usb_hub *hub, int port1,
>                 int status, unsigned portchange, unsigned portstatus)
>  {
> +       /* Did the port go SS.Inactive?  Even if ->persist_enabled is
> cleared the
> +        * device won't come back until a warm reset completes
> +        */
> +       if (hub_port_warm_reset_required(hub, portstatus)) {
> +               udev->reset_resume = 1;
> +               udev->reset_resume_warm = 1;

Also need to set 'status' to 0 here.

If it's truly just a case of waiting for port warm resets to complete
it might be better to inject additional debounce delay here, but the
spec seems to indicate that there is no way to know that escalated
warm resets are in progress.  4.19.5.1 says "The Port Reset (PR) flag
shall be ‘1’ while Hot or Warm Reset is being executed. The Port Reset
Change (PRC) flag shall be set (‘1’) when the reset execution is
complete and PR transitions to ‘0’."... but that is only if software
initiated the warm reset.  When the warm reset was the result of an
HCRST we hit "Note, the completion of the xHC reset process is not
gated by the Root Hub port reset process."
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julius Werner Dec. 12, 2013, 7:01 a.m. UTC | #2
>> ...although, the spec says that it does not wait for the port resets
>> to complete.  As far as I can see re-issuing a warm reset and waiting
>> is the only way to guarantee the core times the recovery.  Presumably
>> the portstatus debounce in hub_activate() mitigates this, but that
>> 100ms is less than a full reset timeout.

It's definitely not just a timing issue for us. I can't reproduce all
the same cases as Vikas, but when I attach a USB analyzer to the ones
I do see the host controller doesn't even start sending a reset.

>>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
>>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
>>> to warm reset.  See table 32 in the xHCI spec, in the definition of
>>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
>>> ports at all on host controller reset?

Oh, interesting, I hadn't seen that yet. So I guess the spec itself is
fine if it were followed to the letter.

I did some more tests about this on my Exynos machine: when I put a
device to autosuspend (U3) and manually poke the xHC reset bit, I do
see an automatic warm reset on the analyzer and the ports manage to
retrain to U0. But after a system suspend/resume which calls
xhci_reset() in the process, there is no reset on the wire. I also
noticed that it doesn't drive a reset (even after manual poking) when
there is no device connected on the other end of the analyzer.

So this might be our problem: maybe these host controllers (Synopsys
DesignWare) issue the spec-mandated warm reset only on ports where
they think there is a device attached. But after a system
suspend/resume (where the whole IP block on the SoC was powered down),
the host controller cannot know that there is still a device with an
active power session attached, and therefore doesn't drive the reset
on its own.

Even though this is a host controller bug, we still have to deal with
it somehow. I guess we could move the code into xhci_plat_resume() and
hide it behind a quirk to lessen the impact. But since reset_resume is
not a common case for most host controllers, it's hard to say if this
is DesignWare specific or a more widespread implementation mistake.
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Dec. 12, 2013, 4:05 p.m. UTC | #3
On Wed, 11 Dec 2013, Julius Werner wrote:

> >> ...although, the spec says that it does not wait for the port resets
> >> to complete.  As far as I can see re-issuing a warm reset and waiting
> >> is the only way to guarantee the core times the recovery.  Presumably
> >> the portstatus debounce in hub_activate() mitigates this, but that
> >> 100ms is less than a full reset timeout.
> 
> It's definitely not just a timing issue for us. I can't reproduce all
> the same cases as Vikas, but when I attach a USB analyzer to the ones
> I do see the host controller doesn't even start sending a reset.
> 
> >>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
> >>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
> >>> to warm reset.  See table 32 in the xHCI spec, in the definition of
> >>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
> >>> ports at all on host controller reset?
> 
> Oh, interesting, I hadn't seen that yet. So I guess the spec itself is
> fine if it were followed to the letter.
> 
> I did some more tests about this on my Exynos machine: when I put a
> device to autosuspend (U3) and manually poke the xHC reset bit, I do
> see an automatic warm reset on the analyzer and the ports manage to
> retrain to U0. But after a system suspend/resume which calls
> xhci_reset() in the process, there is no reset on the wire. I also
> noticed that it doesn't drive a reset (even after manual poking) when
> there is no device connected on the other end of the analyzer.
> 
> So this might be our problem: maybe these host controllers (Synopsys
> DesignWare) issue the spec-mandated warm reset only on ports where
> they think there is a device attached. But after a system
> suspend/resume (where the whole IP block on the SoC was powered down),
> the host controller cannot know that there is still a device with an
> active power session attached, and therefore doesn't drive the reset
> on its own.
> 
> Even though this is a host controller bug, we still have to deal with
> it somehow. I guess we could move the code into xhci_plat_resume() and
> hide it behind a quirk to lessen the impact. But since reset_resume is
> not a common case for most host controllers, it's hard to say if this
> is DesignWare specific or a more widespread implementation mistake.

I was going to suggest something along these lines too.  This seems to 
be a bug in xHCI.  Therefore the fix belongs in xhci-hcd, not in the 
hub driver.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sarah Sharp Dec. 13, 2013, 5:48 p.m. UTC | #4
On Thu, Dec 12, 2013 at 11:05:04AM -0500, Alan Stern wrote:
> On Wed, 11 Dec 2013, Julius Werner wrote:
> 
> > >> ...although, the spec says that it does not wait for the port resets
> > >> to complete.  As far as I can see re-issuing a warm reset and waiting
> > >> is the only way to guarantee the core times the recovery.  Presumably
> > >> the portstatus debounce in hub_activate() mitigates this, but that
> > >> 100ms is less than a full reset timeout.
> > 
> > It's definitely not just a timing issue for us. I can't reproduce all
> > the same cases as Vikas, but when I attach a USB analyzer to the ones
> > I do see the host controller doesn't even start sending a reset.
> > 
> > >>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
> > >>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
> > >>> to warm reset.  See table 32 in the xHCI spec, in the definition of
> > >>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
> > >>> ports at all on host controller reset?
> > 
> > Oh, interesting, I hadn't seen that yet. So I guess the spec itself is
> > fine if it were followed to the letter.
> > 
> > I did some more tests about this on my Exynos machine: when I put a
> > device to autosuspend (U3) and manually poke the xHC reset bit, I do
> > see an automatic warm reset on the analyzer and the ports manage to
> > retrain to U0. But after a system suspend/resume which calls
> > xhci_reset() in the process, there is no reset on the wire. I also
> > noticed that it doesn't drive a reset (even after manual poking) when
> > there is no device connected on the other end of the analyzer.
> > 
> > So this might be our problem: maybe these host controllers (Synopsys
> > DesignWare) issue the spec-mandated warm reset only on ports where
> > they think there is a device attached. But after a system
> > suspend/resume (where the whole IP block on the SoC was powered down),
> > the host controller cannot know that there is still a device with an
> > active power session attached, and therefore doesn't drive the reset
> > on its own.

Ok, that makes some sense.  I could see why host controllers wouldn't
want to drive reset on an unconnected port.

> > Even though this is a host controller bug, we still have to deal with
> > it somehow. I guess we could move the code into xhci_plat_resume() and
> > hide it behind a quirk to lessen the impact. But since reset_resume is
> > not a common case for most host controllers, it's hard to say if this
> > is DesignWare specific or a more widespread implementation mistake.
> 
> I was going to suggest something along these lines too.  This seems to 
> be a bug in xHCI.  Therefore the fix belongs in xhci-hcd, not in the 
> hub driver.

I agree.  Is there a chance that the Synopsys DesignWare will be a PCI
device instead of a platform device?  If so, it would be better to put
the code into xhci_resume instead of xhci_plat_resume.  That also allows
you to only issue the warm reset when the register restore state command
fails, after the xhci_reset call.

Also, I assume that other systems with the Synopsys DesignWare IP will
experience this issue?  I know of at least two other chipsets that will
include that IP, and it would be good to find a way to trigger on the
Synopsys IP, rather than off xHCI PCI vendor and device ID.  Otherwise
we'll be adding PCI IDs to the xHCI driver quirks for many many kernels
to come.

I'm actually leaning towards enabling the check for warm reset broadly.
It seems like it wouldn't hurt to issue a warm reset on the USB 3.0
ports if they're in compliance, poll, or rx.detect.  So, let's enable
this broadly in xhci_resume, mark the patch for stable, but ask for the
backport to be delayed until 3.13.3 is out, to allow for more testing.
If anyone complains of xHCI behavior changes, we'll change the code to
add a quirk.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Dec. 13, 2013, 5:57 p.m. UTC | #5
On Fri, Dec 13, 2013 at 9:48 AM, Sarah Sharp
<sarah.a.sharp@linux.intel.com> wrote:
> On Thu, Dec 12, 2013 at 11:05:04AM -0500, Alan Stern wrote:
>> On Wed, 11 Dec 2013, Julius Werner wrote:
>>
>> > >> ...although, the spec says that it does not wait for the port resets
>> > >> to complete.  As far as I can see re-issuing a warm reset and waiting
>> > >> is the only way to guarantee the core times the recovery.  Presumably
>> > >> the portstatus debounce in hub_activate() mitigates this, but that
>> > >> 100ms is less than a full reset timeout.
>> >
>> > It's definitely not just a timing issue for us. I can't reproduce all
>> > the same cases as Vikas, but when I attach a USB analyzer to the ones
>> > I do see the host controller doesn't even start sending a reset.
>> >
>> > >>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
>> > >>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
>> > >>> to warm reset.  See table 32 in the xHCI spec, in the definition of
>> > >>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
>> > >>> ports at all on host controller reset?
>> >
>> > Oh, interesting, I hadn't seen that yet. So I guess the spec itself is
>> > fine if it were followed to the letter.
>> >
>> > I did some more tests about this on my Exynos machine: when I put a
>> > device to autosuspend (U3) and manually poke the xHC reset bit, I do
>> > see an automatic warm reset on the analyzer and the ports manage to
>> > retrain to U0. But after a system suspend/resume which calls
>> > xhci_reset() in the process, there is no reset on the wire. I also
>> > noticed that it doesn't drive a reset (even after manual poking) when
>> > there is no device connected on the other end of the analyzer.
>> >
>> > So this might be our problem: maybe these host controllers (Synopsys
>> > DesignWare) issue the spec-mandated warm reset only on ports where
>> > they think there is a device attached. But after a system
>> > suspend/resume (where the whole IP block on the SoC was powered down),
>> > the host controller cannot know that there is still a device with an
>> > active power session attached, and therefore doesn't drive the reset
>> > on its own.
>
> Ok, that makes some sense.  I could see why host controllers wouldn't
> want to drive reset on an unconnected port.
>
>> > Even though this is a host controller bug, we still have to deal with
>> > it somehow. I guess we could move the code into xhci_plat_resume() and
>> > hide it behind a quirk to lessen the impact. But since reset_resume is
>> > not a common case for most host controllers, it's hard to say if this
>> > is DesignWare specific or a more widespread implementation mistake.
>>
>> I was going to suggest something along these lines too.  This seems to
>> be a bug in xHCI.  Therefore the fix belongs in xhci-hcd, not in the
>> hub driver.
>
> I agree.  Is there a chance that the Synopsys DesignWare will be a PCI
> device instead of a platform device?  If so, it would be better to put
> the code into xhci_resume instead of xhci_plat_resume.  That also allows
> you to only issue the warm reset when the register restore state command
> fails, after the xhci_reset call.
>
> Also, I assume that other systems with the Synopsys DesignWare IP will
> experience this issue?  I know of at least two other chipsets that will
> include that IP, and it would be good to find a way to trigger on the
> Synopsys IP, rather than off xHCI PCI vendor and device ID.  Otherwise
> we'll be adding PCI IDs to the xHCI driver quirks for many many kernels
> to come.
>
> I'm actually leaning towards enabling the check for warm reset broadly.
> It seems like it wouldn't hurt to issue a warm reset on the USB 3.0
> ports if they're in compliance, poll, or rx.detect.  So, let's enable
> this broadly in xhci_resume, mark the patch for stable, but ask for the
> backport to be delayed until 3.13.3 is out, to allow for more testing.
> If anyone complains of xHCI behavior changes, we'll change the code to
> add a quirk.

Is there a clean way to make this per-port rather than globally at
xhci_resume()?  I am looking to hook into this for port power recovery
as Tianyu's testing encountered "warm reset required" conditions at
runtime_resume.  I'm still on the hunt for a solid reproducer, but it
indicates this is a more general quirk with power session recovery.
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Cohen Dec. 13, 2013, 6 p.m. UTC | #6
Hi Sarah,

On Fri, Dec 13, 2013 at 09:48:15AM -0800, Sarah Sharp wrote:
> On Thu, Dec 12, 2013 at 11:05:04AM -0500, Alan Stern wrote:
> > On Wed, 11 Dec 2013, Julius Werner wrote:
> > 
> > > >> ...although, the spec says that it does not wait for the port resets
> > > >> to complete.  As far as I can see re-issuing a warm reset and waiting
> > > >> is the only way to guarantee the core times the recovery.  Presumably
> > > >> the portstatus debounce in hub_activate() mitigates this, but that
> > > >> 100ms is less than a full reset timeout.
> > > 
> > > It's definitely not just a timing issue for us. I can't reproduce all
> > > the same cases as Vikas, but when I attach a USB analyzer to the ones
> > > I do see the host controller doesn't even start sending a reset.
> > > 
> > > >>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
> > > >>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
> > > >>> to warm reset.  See table 32 in the xHCI spec, in the definition of
> > > >>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
> > > >>> ports at all on host controller reset?
> > > 
> > > Oh, interesting, I hadn't seen that yet. So I guess the spec itself is
> > > fine if it were followed to the letter.
> > > 
> > > I did some more tests about this on my Exynos machine: when I put a
> > > device to autosuspend (U3) and manually poke the xHC reset bit, I do
> > > see an automatic warm reset on the analyzer and the ports manage to
> > > retrain to U0. But after a system suspend/resume which calls
> > > xhci_reset() in the process, there is no reset on the wire. I also
> > > noticed that it doesn't drive a reset (even after manual poking) when
> > > there is no device connected on the other end of the analyzer.
> > > 
> > > So this might be our problem: maybe these host controllers (Synopsys
> > > DesignWare) issue the spec-mandated warm reset only on ports where
> > > they think there is a device attached. But after a system
> > > suspend/resume (where the whole IP block on the SoC was powered down),
> > > the host controller cannot know that there is still a device with an
> > > active power session attached, and therefore doesn't drive the reset
> > > on its own.
> 
> Ok, that makes some sense.  I could see why host controllers wouldn't
> want to drive reset on an unconnected port.
> 
> > > Even though this is a host controller bug, we still have to deal with
> > > it somehow. I guess we could move the code into xhci_plat_resume() and
> > > hide it behind a quirk to lessen the impact. But since reset_resume is
> > > not a common case for most host controllers, it's hard to say if this
> > > is DesignWare specific or a more widespread implementation mistake.
> > 
> > I was going to suggest something along these lines too.  This seems to 
> > be a bug in xHCI.  Therefore the fix belongs in xhci-hcd, not in the 
> > hub driver.
> 
> I agree.  Is there a chance that the Synopsys DesignWare will be a PCI
> device instead of a platform device?  If so, it would be better to put
> the code into xhci_resume instead of xhci_plat_resume.  That also allows

DWC3 on Intel Baytrail and Merrifield is PCI device.

Br, David

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern Dec. 13, 2013, 6:15 p.m. UTC | #7
On Fri, 13 Dec 2013, Dan Williams wrote:

> > I'm actually leaning towards enabling the check for warm reset broadly.
> > It seems like it wouldn't hurt to issue a warm reset on the USB 3.0
> > ports if they're in compliance, poll, or rx.detect.  So, let's enable
> > this broadly in xhci_resume, mark the patch for stable, but ask for the
> > backport to be delayed until 3.13.3 is out, to allow for more testing.
> > If anyone complains of xHCI behavior changes, we'll change the code to
> > add a quirk.
> 
> Is there a clean way to make this per-port rather than globally at
> xhci_resume()?  I am looking to hook into this for port power recovery
> as Tianyu's testing encountered "warm reset required" conditions at
> runtime_resume.  I'm still on the hunt for a solid reproducer, but it
> indicates this is a more general quirk with power session recovery.

There's no reason you can't do per-port testing inside xhci_resume
(assuming you know what to test for) as well as putting a warm reset in
the port-power handler of xhci_hub_control.

Of course, doing simultaneous warm resets on multiple ports will use 
less time than resetting each port individually, in sequence.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dan Williams Dec. 13, 2013, 6:27 p.m. UTC | #8
On Fri, Dec 13, 2013 at 10:15 AM, Alan Stern <stern@rowland.harvard.edu> wrote:
> On Fri, 13 Dec 2013, Dan Williams wrote:
>
>> > I'm actually leaning towards enabling the check for warm reset broadly.
>> > It seems like it wouldn't hurt to issue a warm reset on the USB 3.0
>> > ports if they're in compliance, poll, or rx.detect.  So, let's enable
>> > this broadly in xhci_resume, mark the patch for stable, but ask for the
>> > backport to be delayed until 3.13.3 is out, to allow for more testing.
>> > If anyone complains of xHCI behavior changes, we'll change the code to
>> > add a quirk.
>>
>> Is there a clean way to make this per-port rather than globally at
>> xhci_resume()?  I am looking to hook into this for port power recovery
>> as Tianyu's testing encountered "warm reset required" conditions at
>> runtime_resume.  I'm still on the hunt for a solid reproducer, but it
>> indicates this is a more general quirk with power session recovery.
>
> There's no reason you can't do per-port testing inside xhci_resume
> (assuming you know what to test for) as well as putting a warm reset in
> the port-power handler of xhci_hub_control.

I'm just uneasy putting the recovery there as we lose the context of
why the port was powered-on.  For example we don't want to
pre-empt/duplicate a reset in xhci_hub_control() that is already
specified in hub_events().

> Of course, doing simultaneous warm resets on multiple ports will use
> less time than resetting each port individually, in sequence.
>

For the hub resume case, yes.  For pm_runtime_resume of an individual
port I believe it needs to be synchronous.
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Cohen Dec. 13, 2013, 6:34 p.m. UTC | #9
On Fri, Dec 13, 2013 at 10:00:28AM -0800, David Cohen wrote:
> Hi Sarah,
> 
> On Fri, Dec 13, 2013 at 09:48:15AM -0800, Sarah Sharp wrote:
> > On Thu, Dec 12, 2013 at 11:05:04AM -0500, Alan Stern wrote:
> > > On Wed, 11 Dec 2013, Julius Werner wrote:
> > > 
> > > > >> ...although, the spec says that it does not wait for the port resets
> > > > >> to complete.  As far as I can see re-issuing a warm reset and waiting
> > > > >> is the only way to guarantee the core times the recovery.  Presumably
> > > > >> the portstatus debounce in hub_activate() mitigates this, but that
> > > > >> 100ms is less than a full reset timeout.
> > > > 
> > > > It's definitely not just a timing issue for us. I can't reproduce all
> > > > the same cases as Vikas, but when I attach a USB analyzer to the ones
> > > > I do see the host controller doesn't even start sending a reset.
> > > > 
> > > > >>> The xHCI spec requires that when the xHCI host is reset, a USB reset is
> > > > >>> driven down the USB 3.0 ports.  If hot reset fails, the port may migrate
> > > > >>> to warm reset.  See table 32 in the xHCI spec, in the definition of
> > > > >>> HCRST.  It sounds like this host doesn't drive a USB reset down USB 3.0
> > > > >>> ports at all on host controller reset?
> > > > 
> > > > Oh, interesting, I hadn't seen that yet. So I guess the spec itself is
> > > > fine if it were followed to the letter.
> > > > 
> > > > I did some more tests about this on my Exynos machine: when I put a
> > > > device to autosuspend (U3) and manually poke the xHC reset bit, I do
> > > > see an automatic warm reset on the analyzer and the ports manage to
> > > > retrain to U0. But after a system suspend/resume which calls
> > > > xhci_reset() in the process, there is no reset on the wire. I also
> > > > noticed that it doesn't drive a reset (even after manual poking) when
> > > > there is no device connected on the other end of the analyzer.
> > > > 
> > > > So this might be our problem: maybe these host controllers (Synopsys
> > > > DesignWare) issue the spec-mandated warm reset only on ports where
> > > > they think there is a device attached. But after a system
> > > > suspend/resume (where the whole IP block on the SoC was powered down),
> > > > the host controller cannot know that there is still a device with an
> > > > active power session attached, and therefore doesn't drive the reset
> > > > on its own.
> > 
> > Ok, that makes some sense.  I could see why host controllers wouldn't
> > want to drive reset on an unconnected port.
> > 
> > > > Even though this is a host controller bug, we still have to deal with
> > > > it somehow. I guess we could move the code into xhci_plat_resume() and
> > > > hide it behind a quirk to lessen the impact. But since reset_resume is
> > > > not a common case for most host controllers, it's hard to say if this
> > > > is DesignWare specific or a more widespread implementation mistake.
> > > 
> > > I was going to suggest something along these lines too.  This seems to 
> > > be a bug in xHCI.  Therefore the fix belongs in xhci-hcd, not in the 
> > > hub driver.
> > 
> > I agree.  Is there a chance that the Synopsys DesignWare will be a PCI
> > device instead of a platform device?  If so, it would be better to put
> > the code into xhci_resume instead of xhci_plat_resume.  That also allows
> 
> DWC3 on Intel Baytrail and Merrifield is PCI device.

But it actually registers xHCI's platform device to probe it. So,
nevermind.

Br, David
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index a7c04e24ca48..30ce237569dd 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -2783,8 +2783,14 @@  static int check_port_resume_type(struct
usb_device *udev,
                struct usb_hub *hub, int port1,
                int status, unsigned portchange, unsigned portstatus)
 {
+       /* Did the port go SS.Inactive?  Even if ->persist_enabled is
cleared the
+        * device won't come back until a warm reset completes
+        */
+       if (hub_port_warm_reset_required(hub, portstatus)) {
+               udev->reset_resume = 1;
+               udev->reset_resume_warm = 1;
        /* Is the device still present? */
-       if (status || port_is_suspended(hub, portstatus) ||
+       } else if (status || port_is_suspended(hub, portstatus) ||
                        !port_is_power_on(hub, portstatus) ||
                        !(portstatus & USB_PORT_STAT_CONNECTION)) {
                if (status >= 0)
@@ -4022,7 +4028,8 @@  hub_port_init (struct usb_hub *hub, struct
usb_device *udev, int port1,

        /* Reset the device; full speed may morph to high speed */
        /* FIXME a USB 2.0 device may morph into SuperSpeed on reset. */
-       retval = hub_port_reset(hub, port1, udev, delay, false);
+       retval = hub_port_reset(hub, port1, udev, delay,
+                               udev->reset_resume_warm);
        if (retval < 0)         /* error or disconnect */
                goto fail;
        /* success, speed is known */
@@ -4730,7 +4737,8 @@  static void hub_events(void)

                /* deal with port status changes */
                for (i = 1; i <= hdev->maxchild; i++) {
-                       if (test_bit(i, hub->busy_bits))
+                       if (test_bit(i, hub->busy_bits) ||
+                           test_bit(i, hub->delayed_change_bits))
                                continue;
                        connect_change = test_bit(i, hub->change_bits);
                        wakeup_change = test_and_clear_bit(i, hub->wakeup_bits);
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 7454865ad148..ff1b6fe4a0ff 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -572,6 +572,7 @@  struct usb_device {

        unsigned do_remote_wakeup:1;
        unsigned reset_resume:1;
+       unsigned reset_resume_warm:1;
        unsigned port_is_suspended:1;
 #endif
        struct wusb_dev *wusb_dev;

>
>> The host port starts out back in
>> Rx.Detect without remembering anything about its previous state, but
>> the device is still in U3. The host detects Rx terminations, moves to
>> Polling and starts sending LFPS link training packets, but the device
>> doesn't expect those and interprets them as link problems (moving to
>> Recovery). What happens next seems to be device specific, but
>> apparently the device can end up in SS.Inactive while the host port
>> gets stuck in Polling or Recovery (or some kind of livelock between
>> those).

In testing the port power patches I see this particular device give up
on its superspeed connection if it sees too many link failures and
fallsback to connecting via its high speed connection.

>> This patch tries to warm reset all USB 3.0 ports on reset-resume
>> (after xhci_reset() was called) that had devices connected to them
>> before suspend. This seems to be the only way to ensure the devices'
>> state machines get back to a well-defined state that the host can work
>> with. I don't think this is a specific hardware bug, it's just an
>> unfortunate design flaw that the USB 3.0 spec doesn't account for a
>> root hub port being reset independently of its connected device. I
>> think Sarah is correct that it could be limited to root hubs, though.
>

I think we also need something like the following to hold off khubd
while further resume actions may be taking place.  Again we're
mitigated by the debounce delay in most cases, but if a warm reset
recovery is going to take longer than that khubd needs to be held off
for the given port.

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 30ce237569dd..a99e3eb28aa5 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -1173,22 +1173,25 @@  static void hub_activate(struct usb_hub *hub,
enum hub_activation_type type)
                                                port_resumed))
                                set_bit(port1, hub->change_bits);

-               } else if (udev->persist_enabled) {
+               } else {
                        struct usb_port *port_dev = hub->ports[port1 - 1];

+                       if (udev->persist_enabled) {
 #ifdef CONFIG_PM
-                       udev->reset_resume = 1;
+                               udev->reset_resume = 1;
 #endif
-                       /* Don't set the change_bits when the device
-                        * was powered off.
-                        */
-                       if (port_dev->power_is_on)
-                               set_bit(port1, hub->change_bits);
+                               /* Don't set the change_bits when the device
+                                * was powered off.
+                                */
+                               if (port_dev->power_is_on)
+                                       set_bit(port1, hub->change_bits);
+                       }

-               } else {
-                       /* The power session is gone; tell khubd */
-                       usb_set_device_state(udev, USB_STATE_NOTATTACHED);
-                       set_bit(port1, hub->change_bits);
+                       /* The power session may be gone; wait for port
+                        * recovery before letting khubd take action on
+                        * this port
+                        */
+                       set_bit(port1, hub->delayed_change_bits);
                }
        }

@@ -3285,6 +3288,11 @@  int usb_port_resume(struct usb_device *udev,
pm_message_t msg)
                usb_unlocked_enable_lpm(udev);
        }

+       if (test_and_clear_bit(port1, hub->delayed_change_bits)) {
+               set_bit(port1, hub->change_bits);
+               kick_khubd(hub);
+       }
+
        return status;
 }

diff --git a/drivers/usb/core/hub.h b/drivers/usb/core/hub.h
index 4e4790dea343..74e87c0380f8 100644
--- a/drivers/usb/core/hub.h
+++ b/drivers/usb/core/hub.h
@@ -45,8 +45,9 @@  struct usb_hub {
        unsigned long           event_bits[1];  /* status change bitmask */
        unsigned long           change_bits[1]; /* ports with logical connect
                                                        status change */
-       unsigned long           busy_bits[1];   /* ports being reset or
-                                                       resumed */
+       unsigned long           busy_bits[1];   /* ports being reset */
+       unsigned long           delayed_change_bits[1]; /* ports awaiting
+                                                          recovery */
        unsigned long           removed_bits[1]; /* ports with a "removed"
                                                        device present */
        unsigned long           wakeup_bits[1]; /* ports that have signaled