Message ID | 20220407115918.1.I8226c7fdae88329ef70957b96a39b346c69a914e@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | USB: hcd-pci: Fully suspend across freeze/thaw cycle | expand |
On Thu, Apr 07, 2022 at 11:59:55AM -0700, Evan Green wrote: > The documentation for the freeze() method says that it "should quiesce > the device so that it doesn't generate IRQs or DMA". The unspoken > consequence of not doing this is that MSIs aimed at non-boot CPUs may > get fully lost if they're sent during the period where the target CPU is > offline. > > The current callbacks for USB HCD do not fully quiesce interrupts, > specifically on XHCI. Change to use the full suspend/resume flow for > freeze/thaw to ensure interrupts are fully quiesced. This fixes issues > where USB devices fail to thaw during hibernation because XHCI misses > its interrupt and fails to recover. I don't think your interpretation is quite right. The problem doesn't lie in the HCD callbacks but rather in the root-hub callbacks. Correct me if I'm wrong about xHCI, but AFAIK the host controller doesn't issue any interrupt requests on its own behalf; it issues IRQs only on behalf of its root hubs. Given that the root hubs should be suspended (i.e., frozen) at this point, and hence not running, the only IRQs they might make would be for wakeup requests. So during freeze, wakeups should be disabled on root hubs. Currently I believe we don't do this; if a root hub was already runtime suspended when asked to go into freeze, its wakeup setting will remain unchanged. _That_ is the bug which needs to be fixed. (Consider what would happen if a root hub wakes up after it is frozen but before the host controller is frozen: The attempt to freeze the host controller would fail, causing the entire hibernation transition to fail.) The whole issue of how wakeup requests should be handled during hibernation is a difficult one. I don't think we have any good protection against cases where a wakeup request races with the system entering hibernation. For instance, if a wakeup event occurs after we go into the thaw state, it won't even be recognized as such because the system will be running normally and will handle it as an ordinary event. But then it will be consumed, so the wakeup signal won't remain on to reactivate the system once it has shut down, and when the stored kernel image is eventually restored it won't remember that the event ever happened. Alan Stern > Signed-off-by: Evan Green <evgreen@chromium.org> --- > > You may be able to reproduce this issue on your own machine via the > following: > 1. Disable runtime PM on your XHCI controller > 2. Aim your XHCI IRQ at a non-boot CPU (replace 174): echo 2 > > /proc/irq/174/smp_affinity > 3. Attempt to hibernate (no need to actually go all the way down). > > I run 2 and 3 in a loop, and can usually hit a hang or dead XHCI > controller within 1-2 iterations. I happened to notice this on an > Alderlake system where runtime PM is accidentally disabled for one of > the XHCI controllers. Some more discussion and debugging can be found at > [1]. > > [1] https://lore.kernel.org/linux-pci/CAE=gft4a-QL82iFJE_xRQ3JrMmz-KZKWREtz=MghhjFbJeK=8A@mail.gmail.com/T/#u > > --- > drivers/usb/core/hcd-pci.c | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > index 8176bc81a635d6..e02506807ffc6c 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -616,10 +616,10 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { > .suspend_noirq = hcd_pci_suspend_noirq, > .resume_noirq = hcd_pci_resume_noirq, > .resume = hcd_pci_resume, > - .freeze = check_root_hub_suspended, > - .freeze_noirq = check_root_hub_suspended, > - .thaw_noirq = NULL, > - .thaw = NULL, > + .freeze = hcd_pci_suspend, > + .freeze_noirq = hcd_pci_suspend_noirq, > + .thaw_noirq = hcd_pci_resume_noirq, > + .thaw = hcd_pci_resume, > .poweroff = hcd_pci_suspend, > .poweroff_noirq = hcd_pci_suspend_noirq, > .restore_noirq = hcd_pci_resume_noirq, > -- > 2.31.0 >
Hi Alan, On Fri, Apr 8, 2022 at 7:29 AM Alan Stern <stern@rowland.harvard.edu> wrote: > > On Thu, Apr 07, 2022 at 11:59:55AM -0700, Evan Green wrote: > > The documentation for the freeze() method says that it "should quiesce > > the device so that it doesn't generate IRQs or DMA". The unspoken > > consequence of not doing this is that MSIs aimed at non-boot CPUs may > > get fully lost if they're sent during the period where the target CPU is > > offline. > > > > The current callbacks for USB HCD do not fully quiesce interrupts, > > specifically on XHCI. Change to use the full suspend/resume flow for > > freeze/thaw to ensure interrupts are fully quiesced. This fixes issues > > where USB devices fail to thaw during hibernation because XHCI misses > > its interrupt and fails to recover. > > I don't think your interpretation is quite right. The problem doesn't lie > in the HCD callbacks but rather in the root-hub callbacks. > > Correct me if I'm wrong about xHCI, but AFAIK the host controller doesn't > issue any interrupt requests on its own behalf; it issues IRQs only on > behalf of its root hubs. Given that the root hubs should be suspended > (i.e., frozen) at this point, and hence not running, the only IRQs they > might make would be for wakeup requests. > > So during freeze, wakeups should be disabled on root hubs. Currently I > believe we don't do this; if a root hub was already runtime suspended when > asked to go into freeze, its wakeup setting will remain unchanged. _That_ For my issue at least, it's the opposite. Enabling runtime pm on the controller significantly reduces the repro rate of the lost interrupt. I think having the controller runtime suspended reduces the overall number of interrupts that flow in, which is why my chances to hit an interrupt in this window drop, but aren't fully eliminated. I think xhci may still find reasons to generate interrupts even if all of its root hub ports are suspended without wake events. For example, won't Port Status Change Events still come in if a device is unplugged or overcurrents in between freeze() and thaw()? The spec does mention that generation of this event is gated by the HCHalted flag, but at least in my digging around I couldn't find a place where we halt the controller through this path. With how fragile xhci (and maybe others?) are towards lost interrupts, even if it does happen to be perfect now, it seems like it would be more resilient to just fully suspend the controller across this transition. I'd also put forward the hypothesis (feel free to shoot it down!) that unless there's a human-scale time penalty with this change, the downsides of being more heavy handed like this across freeze/thaw are minimal. There's always a thaw() right on the heels of freeze(), and hibernation is such a rare and jarring transition that being able to recover after the transition is more important than accomplishing the transition quickly. -Evan > is the bug which needs to be fixed. (Consider what would happen if a root > hub wakes up after it is frozen but before the host controller is frozen: > The attempt to freeze the host controller would fail, causing the entire > hibernation transition to fail.) > > The whole issue of how wakeup requests should be handled during hibernation > is a difficult one. I don't think we have any good protection against cases > where a wakeup request races with the system entering hibernation. For > instance, if a wakeup event occurs after we go into the thaw state, it won't > even be recognized as such because the system will be running normally and > will handle it as an ordinary event. But then it will be consumed, so the > wakeup signal won't remain on to reactivate the system once it has shut > down, and when the stored kernel image is eventually restored it won't > remember that the event ever happened. > > Alan Stern > > > Signed-off-by: Evan Green <evgreen@chromium.org> --- > > > > You may be able to reproduce this issue on your own machine via the > > following: > > 1. Disable runtime PM on your XHCI controller > > 2. Aim your XHCI IRQ at a non-boot CPU (replace 174): echo 2 > > > /proc/irq/174/smp_affinity > > 3. Attempt to hibernate (no need to actually go all the way down). > > > > I run 2 and 3 in a loop, and can usually hit a hang or dead XHCI > > controller within 1-2 iterations. I happened to notice this on an > > Alderlake system where runtime PM is accidentally disabled for one of > > the XHCI controllers. Some more discussion and debugging can be found at > > [1]. > > > > [1] https://lore.kernel.org/linux-pci/CAE=gft4a-QL82iFJE_xRQ3JrMmz-KZKWREtz=MghhjFbJeK=8A@mail.gmail.com/T/#u > > > > --- > > drivers/usb/core/hcd-pci.c | 8 ++++---- > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > > index 8176bc81a635d6..e02506807ffc6c 100644 > > --- a/drivers/usb/core/hcd-pci.c > > +++ b/drivers/usb/core/hcd-pci.c > > @@ -616,10 +616,10 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { > > .suspend_noirq = hcd_pci_suspend_noirq, > > .resume_noirq = hcd_pci_resume_noirq, > > .resume = hcd_pci_resume, > > - .freeze = check_root_hub_suspended, > > - .freeze_noirq = check_root_hub_suspended, > > - .thaw_noirq = NULL, > > - .thaw = NULL, > > + .freeze = hcd_pci_suspend, > > + .freeze_noirq = hcd_pci_suspend_noirq, > > + .thaw_noirq = hcd_pci_resume_noirq, > > + .thaw = hcd_pci_resume, > > .poweroff = hcd_pci_suspend, > > .poweroff_noirq = hcd_pci_suspend_noirq, > > .restore_noirq = hcd_pci_resume_noirq, > > -- > > 2.31.0 > >
On Fri, Apr 08, 2022 at 02:52:30PM -0700, Evan Green wrote: > Hi Alan, Hello. > On Fri, Apr 8, 2022 at 7:29 AM Alan Stern <stern@rowland.harvard.edu> wrote: > > > > On Thu, Apr 07, 2022 at 11:59:55AM -0700, Evan Green wrote: > > > The documentation for the freeze() method says that it "should quiesce > > > the device so that it doesn't generate IRQs or DMA". The unspoken > > > consequence of not doing this is that MSIs aimed at non-boot CPUs may > > > get fully lost if they're sent during the period where the target CPU is > > > offline. > > > > > > The current callbacks for USB HCD do not fully quiesce interrupts, > > > specifically on XHCI. Change to use the full suspend/resume flow for > > > freeze/thaw to ensure interrupts are fully quiesced. This fixes issues > > > where USB devices fail to thaw during hibernation because XHCI misses > > > its interrupt and fails to recover. > > > > I don't think your interpretation is quite right. The problem doesn't lie > > in the HCD callbacks but rather in the root-hub callbacks. > > > > Correct me if I'm wrong about xHCI, but AFAIK the host controller doesn't > > issue any interrupt requests on its own behalf; it issues IRQs only on > > behalf of its root hubs. Given that the root hubs should be suspended > > (i.e., frozen) at this point, and hence not running, the only IRQs they > > might make would be for wakeup requests. > > > > So during freeze, wakeups should be disabled on root hubs. Currently I > > believe we don't do this; if a root hub was already runtime suspended when > > asked to go into freeze, its wakeup setting will remain unchanged. _That_ > > For my issue at least, it's the opposite. Enabling runtime pm on the > controller significantly reduces the repro rate of the lost interrupt. That doesn't seem to make sense. If the controller is in runtime suspend at the start of hibernation, the pci_pm_freeze() routine will do a runtime resume before calling the HCD freeze function. So when the controller gets put into the freeze state, it is guaranteed not to be runtime suspended regardless of what you enable. > I think having the controller runtime suspended reduces the overall > number of interrupts that flow in, which is why my chances to hit an > interrupt in this window drop, but aren't fully eliminated. When you ran your tests, was wakeup enabled for the host controller? > I think xhci may still find reasons to generate interrupts even if all > of its root hub ports are suspended without wake events. For example, > won't Port Status Change Events still come in if a device is unplugged > or overcurrents in between freeze() and thaw()? I'm not an expert on xHCI or xhci-hcd. For that, we should ask the xhci-hcd maintainer (CC'ed). In fact, he should have been CC'ed on the original patch since it was meant to fix a problem involving xHCI controllers. With EHCI, for example, if a port status change event occurs while the root hub is suspended with wakeups disabled, no interrupt request will be generated because the port-specific WKOC_E, WKDSCNNT_E, and WKCNNT_E (Wake on Over-Current Enable, Wake on Disconnect Enable, and Wake on Connect Enable) bits are turned off. In effect, the port-status change events can occur but they aren't treated as wakeup events. > The spec does mention > that generation of this event is gated by the HCHalted flag, but at > least in my digging around I couldn't find a place where we halt the > controller through this path. Bear in mind that suspending the controller and suspending the root hub are two different things. > With how fragile xhci (and maybe > others?) are towards lost interrupts, even if it does happen to be > perfect now, it seems like it would be more resilient to just fully > suspend the controller across this transition. Suspending the controller won't fix the problem if the wakeup settings for the root hubs are wrong (although it may reduce the window for a race, like what you mentioned above). Conversely, if the wakeup settings for the root hubs are correct then suspending the controller shouldn't make any difference. > I'd also put forward the hypothesis (feel free to shoot it down!) that > unless there's a human-scale time penalty with this change, the > downsides of being more heavy handed like this across freeze/thaw are > minimal. There's always a thaw() right on the heels of freeze(), and > hibernation is such a rare and jarring transition that being able to > recover after the transition is more important than accomplishing the > transition quickly. That's true, but it ignores the underlying problem described in the preceding paragraphs. Alan Stern
Hi On 9.4.2022 4.58, Alan Stern wrote: > On Fri, Apr 08, 2022 at 02:52:30PM -0700, Evan Green wrote: >> Hi Alan, > > Hello. > >> On Fri, Apr 8, 2022 at 7:29 AM Alan Stern <stern@rowland.harvard.edu> wrote: >>> >>> On Thu, Apr 07, 2022 at 11:59:55AM -0700, Evan Green wrote: >>>> The documentation for the freeze() method says that it "should quiesce >>>> the device so that it doesn't generate IRQs or DMA". The unspoken >>>> consequence of not doing this is that MSIs aimed at non-boot CPUs may >>>> get fully lost if they're sent during the period where the target CPU is >>>> offline. >>>> >>>> The current callbacks for USB HCD do not fully quiesce interrupts, >>>> specifically on XHCI. Change to use the full suspend/resume flow for >>>> freeze/thaw to ensure interrupts are fully quiesced. This fixes issues >>>> where USB devices fail to thaw during hibernation because XHCI misses >>>> its interrupt and fails to recover. >>> >>> I don't think your interpretation is quite right. The problem doesn't lie >>> in the HCD callbacks but rather in the root-hub callbacks. >>> >>> Correct me if I'm wrong about xHCI, but AFAIK the host controller doesn't >>> issue any interrupt requests on its own behalf; it issues IRQs only on >>> behalf of its root hubs. Given that the root hubs should be suspended >>> (i.e., frozen) at this point, and hence not running, the only IRQs they >>> might make would be for wakeup requests. >>> >>> So during freeze, wakeups should be disabled on root hubs. Currently I >>> believe we don't do this; if a root hub was already runtime suspended when >>> asked to go into freeze, its wakeup setting will remain unchanged. _That_ In xHCI case freeze will suspend the roothub and make sure all connected devices are in suspended U3 state, but it won't prevent interrupts. And yes, my understanding is also that if devices were runtime suspended with wake enabled before freeze, then devices can can initiate resume any time in the first stages of hibernate (freeze-thaw), causing an interrupt. We can reduce interrupts by disabling device wake in freeze, but any port change can still cause interrupts. >> >> For my issue at least, it's the opposite. Enabling runtime pm on the >> controller significantly reduces the repro rate of the lost interrupt. > > That doesn't seem to make sense. If the controller is in runtime suspend at > the start of hibernation, the pci_pm_freeze() routine will do a runtime > resume before calling the HCD freeze function. So when the controller gets > put into the freeze state, it is guaranteed not to be runtime suspended > regardless of what you enable. > >> I think having the controller runtime suspended reduces the overall >> number of interrupts that flow in, which is why my chances to hit an >> interrupt in this window drop, but aren't fully eliminated. > > When you ran your tests, was wakeup enabled for the host controller? > >> I think xhci may still find reasons to generate interrupts even if all >> of its root hub ports are suspended without wake events. For example, >> won't Port Status Change Events still come in if a device is unplugged >> or overcurrents in between freeze() and thaw()? Yes, as long as host is running, and host is running between freeze and thaw. > > I'm not an expert on xHCI or xhci-hcd. For that, we should ask the xhci-hcd > maintainer (CC'ed). In fact, he should have been CC'ed on the original > patch since it was meant to fix a problem involving xHCI controllers. > > With EHCI, for example, if a port status change event occurs while the root > hub is suspended with wakeups disabled, no interrupt request will be > generated because the port-specific WKOC_E, WKDSCNNT_E, and WKCNNT_E (Wake > on Over-Current Enable, Wake on Disconnect Enable, and Wake on Connect > Enable) bits are turned off. In effect, the port-status change events can > occur but they aren't treated as wakeup events. The port-specific wake flags in xHCI only affects interrupt and wake generation for a suspended host. In the freeze() to thaw() stage host is running so flags don't have any effect > >> The spec does mention >> that generation of this event is gated by the HCHalted flag, but at >> least in my digging around I couldn't find a place where we halt the >> controller through this path. > > Bear in mind that suspending the controller and suspending the root hub are > two different things. > >> With how fragile xhci (and maybe >> others?) are towards lost interrupts, even if it does happen to be >> perfect now, it seems like it would be more resilient to just fully >> suspend the controller across this transition. > > Suspending the controller won't fix the problem if the wakeup settings for > the root hubs are wrong (although it may reduce the window for a race, like > what you mentioned above). Conversely, if the wakeup settings for the root > hubs are correct then suspending the controller shouldn't make any > difference. > >> I'd also put forward the hypothesis (feel free to shoot it down!) that >> unless there's a human-scale time penalty with this change, the >> downsides of being more heavy handed like this across freeze/thaw are >> minimal. There's always a thaw() right on the heels of freeze(), and >> hibernation is such a rare and jarring transition that being able to >> recover after the transition is more important than accomplishing the >> transition quickly. > > That's true, but it ignores the underlying problem described in the > preceding paragraphs. > Would it make sense prevent xHCI interrupt generation in the host freeze() stage, clearing the xHCI EINT bit in addition to calling check_roothub_suspend()? Then enable it back in thaw() Thanks -Mathias
On Mon, Apr 11, 2022 at 01:43:15PM +0300, Mathias Nyman wrote: > Hi > > On 9.4.2022 4.58, Alan Stern wrote: > >>> So during freeze, wakeups should be disabled on root hubs. Currently I > >>> believe we don't do this; if a root hub was already runtime suspended when > >>> asked to go into freeze, its wakeup setting will remain unchanged. _That_ > > In xHCI case freeze will suspend the roothub and make sure all connected devices > are in suspended U3 state, but it won't prevent interrupts. > > And yes, my understanding is also that if devices were runtime suspended with wake > enabled before freeze, then devices can can initiate resume any time in the first > stages of hibernate (freeze-thaw), causing an interrupt. Yeah. I think we need to change that. > We can reduce interrupts by disabling device wake in freeze, but any port change > can still cause interrupts. Are you sure about this? Disabling wakeup for the root-hub device is supposed to prevent interrupts from being generated when a port-change event happens. > >> I think xhci may still find reasons to generate interrupts even if all > >> of its root hub ports are suspended without wake events. For example, > >> won't Port Status Change Events still come in if a device is unplugged > >> or overcurrents in between freeze() and thaw()? > > Yes, as long as host is running, and host is running between freeze and thaw. It's okay for the events to come in, but if wakeup is disabled on the root hub then the events should not cause an interrupt request. > > I'm not an expert on xHCI or xhci-hcd. For that, we should ask the xhci-hcd > > maintainer (CC'ed). In fact, he should have been CC'ed on the original > > patch since it was meant to fix a problem involving xHCI controllers. > > > > With EHCI, for example, if a port status change event occurs while the root > > hub is suspended with wakeups disabled, no interrupt request will be > > generated because the port-specific WKOC_E, WKDSCNNT_E, and WKCNNT_E (Wake > > on Over-Current Enable, Wake on Disconnect Enable, and Wake on Connect > > Enable) bits are turned off. In effect, the port-status change events can > > occur but they aren't treated as wakeup events. > > The port-specific wake flags in xHCI only affects interrupt and wake generation > for a suspended host. In the freeze() to thaw() stage host is running so flags > don't have any effect Is it possible to prevent xHCI from generating an interrupt request if a port change occurs on the root hub while the root hub is suspended but the controller is running? For example, what would happen if the user unplugs a device right in the middle of the freeze transition, after the root hub has been frozen but before the controller is frozen? We don't want such an unplug event to prevent the system from going into hibernation -- especially if the root hub was not enabled for wakeup. (If the root hub _is_ enabled for wakeup then it's questionable. Unplugging a device would be a wakeup event, so you could easily argue that it _should_ prevent the system from going into hibernation. After all, if the unplug happened a few milliseconds later, after the system had fully gone into hibernation, then it would cause the system to wake up.) > Would it make sense prevent xHCI interrupt generation in the host > freeze() stage, clearing the xHCI EINT bit in addition to calling > check_roothub_suspend()? > Then enable it back in thaw() That won't fully eliminate the problem mentioned in the preceding paragraphs, although I guess it would help somewhat. Alan Stern
On 11.4.2022 17.50, Alan Stern wrote: > On Mon, Apr 11, 2022 at 01:43:15PM +0300, Mathias Nyman wrote: >> Hi >> >> On 9.4.2022 4.58, Alan Stern wrote: >>>>> So during freeze, wakeups should be disabled on root hubs. Currently I >>>>> believe we don't do this; if a root hub was already runtime suspended when >>>>> asked to go into freeze, its wakeup setting will remain unchanged. _That_ >> >> In xHCI case freeze will suspend the roothub and make sure all connected devices >> are in suspended U3 state, but it won't prevent interrupts. >> >> And yes, my understanding is also that if devices were runtime suspended with wake >> enabled before freeze, then devices can can initiate resume any time in the first >> stages of hibernate (freeze-thaw), causing an interrupt. > > Yeah. I think we need to change that. > >> We can reduce interrupts by disabling device wake in freeze, but any port change >> can still cause interrupts. > > Are you sure about this? Disabling wakeup for the root-hub device is > supposed to prevent interrupts from being generated when a port-change > event happens. Just tested connecting a device with roothub suspended and host running. Roothub ports had wake on connect disabled. It still caused interrupts. # cat power/runtime_status active # cat usb2/power/runtime_status suspended # dmesg -C # dmesg -w [ 789.341756] xhci_hcd 0000:00:14.0: Port change event, 2-3, id 15, portsc: 0x21203 [ 789.479484] xhci_hcd 0000:00:14.0: Port change event, 2-3, id 15, portsc: 0x201203 > >>>> I think xhci may still find reasons to generate interrupts even if all >>>> of its root hub ports are suspended without wake events. For example, >>>> won't Port Status Change Events still come in if a device is unplugged >>>> or overcurrents in between freeze() and thaw()? >> >> Yes, as long as host is running, and host is running between freeze and thaw. > > It's okay for the events to come in, but if wakeup is disabled on the > root hub then the events should not cause an interrupt request. > >>> I'm not an expert on xHCI or xhci-hcd. For that, we should ask the xhci-hcd >>> maintainer (CC'ed). In fact, he should have been CC'ed on the original >>> patch since it was meant to fix a problem involving xHCI controllers. >>> >>> With EHCI, for example, if a port status change event occurs while the root >>> hub is suspended with wakeups disabled, no interrupt request will be >>> generated because the port-specific WKOC_E, WKDSCNNT_E, and WKCNNT_E (Wake >>> on Over-Current Enable, Wake on Disconnect Enable, and Wake on Connect >>> Enable) bits are turned off. In effect, the port-status change events can >>> occur but they aren't treated as wakeup events. >> >> The port-specific wake flags in xHCI only affects interrupt and wake generation >> for a suspended host. In the freeze() to thaw() stage host is running so flags >> don't have any effect > > Is it possible to prevent xHCI from generating an interrupt request if a > port change occurs on the root hub while the root hub is suspended but > the controller is running? > Didn't find a good way. No new interrupt will be generated by a port change if there is a uncleared previous change bit set in that ports PORTSC register. But there is no way to manually set those bits (write 1 to clear) > For example, what would happen if the user unplugs a device right in the > middle of the freeze transition, after the root hub has been frozen but > before the controller is frozen? We don't want such an unplug event to > prevent the system from going into hibernation -- especially if the root > hub was not enabled for wakeup. We should be able to let system go to hibernate even if we get a disconnect interrupt between roothub and host controller freeze. Host is not yet suspended so no PME# wake is generated, only an interrupt. From Linux PM point of view it should be ok as well as the actual xhci device that is generating the interrupt is hasnt completer freeze() The xhci interrupt handler just needs to make sure that the disconnect isn't propagated if roothub is suspended and wake on disconnect is not set. And definitely make sure xhci doesn't start roothub polling. When freeze() is called for the host we should prevent the host from generating interrupts. > > (If the root hub _is_ enabled for wakeup then it's questionable. > Unplugging a device would be a wakeup event, so you could easily argue > that it _should_ prevent the system from going into hibernation. After > all, if the unplug happened a few milliseconds later, after the system > had fully gone into hibernation, then it would cause the system to wake > up.) > >> Would it make sense prevent xHCI interrupt generation in the host >> freeze() stage, clearing the xHCI EINT bit in addition to calling >> check_roothub_suspend()? >> Then enable it back in thaw() > > That won't fully eliminate the problem mentioned in the preceding > paragraphs, although I guess it would help somewhat. Would the following steps solve this? 1. Disable device initiated resume for connected usb devices in freeze() 2. Don't propagate connect or OC changes if roothub is suspended and port wake flags are disabled. I.E don't kick roothub polling in xhci interrupt handler here. 3 Disable host interrupt generation in host freeze(), and re-enable in thaw() -Mathias
On Tue, Apr 12, 2022 at 05:56:42PM +0300, Mathias Nyman wrote: > On 11.4.2022 17.50, Alan Stern wrote: > > For example, what would happen if the user unplugs a device right in the > > middle of the freeze transition, after the root hub has been frozen but > > before the controller is frozen? We don't want such an unplug event to > > prevent the system from going into hibernation -- especially if the root > > hub was not enabled for wakeup. > > We should be able to let system go to hibernate even if we get a disconnect > interrupt between roothub and host controller freeze. > Host is not yet suspended so no PME# wake is generated, only an interrupt. > > From Linux PM point of view it should be ok as well as the actual xhci > device that is generating the interrupt is hasnt completer freeze() > > The xhci interrupt handler just needs to make sure that the disconnect > isn't propagated if roothub is suspended and wake on disconnect > is not set. And definitely make sure xhci doesn't start roothub polling. > > When freeze() is called for the host we should prevent the host from > generating interrupts. I guess that means adding a new callback. Or we could just suspend the controller, like Evan proposed originally. > > (If the root hub _is_ enabled for wakeup then it's questionable. > > Unplugging a device would be a wakeup event, so you could easily argue > > that it _should_ prevent the system from going into hibernation. After > > all, if the unplug happened a few milliseconds later, after the system > > had fully gone into hibernation, then it would cause the system to wake > > up.) > > > >> Would it make sense prevent xHCI interrupt generation in the host > >> freeze() stage, clearing the xHCI EINT bit in addition to calling > >> check_roothub_suspend()? > >> Then enable it back in thaw() > > > > That won't fully eliminate the problem mentioned in the preceding > > paragraphs, although I guess it would help somewhat. > > Would the following steps solve this? > > 1. Disable device initiated resume for connected usb devices in freeze() > > 2. Don't propagate connect or OC changes if roothub is suspended and port wake > flags are disabled. I.E don't kick roothub polling in xhci interrupt > handler here. I guess you can't just halt the entire host controller when only one of the root hubs is suspended with wakeup disabled. That does complicate things. But you could halt it as soon as both of the root hubs are frozen. Wouldn't that prevent interrupt generation? > 3 Disable host interrupt generation in host freeze(), and re-enable in thaw() And then this step wouldn't be necessary, right? Alan Stern
On 12.4.2022 18.40, Alan Stern wrote: > On Tue, Apr 12, 2022 at 05:56:42PM +0300, Mathias Nyman wrote: >> On 11.4.2022 17.50, Alan Stern wrote: >>> For example, what would happen if the user unplugs a device right in the >>> middle of the freeze transition, after the root hub has been frozen but >>> before the controller is frozen? We don't want such an unplug event to >>> prevent the system from going into hibernation -- especially if the root >>> hub was not enabled for wakeup. >> >> We should be able to let system go to hibernate even if we get a disconnect >> interrupt between roothub and host controller freeze. >> Host is not yet suspended so no PME# wake is generated, only an interrupt. >> >> From Linux PM point of view it should be ok as well as the actual xhci >> device that is generating the interrupt is hasnt completer freeze() >> >> The xhci interrupt handler just needs to make sure that the disconnect >> isn't propagated if roothub is suspended and wake on disconnect >> is not set. And definitely make sure xhci doesn't start roothub polling. >> >> When freeze() is called for the host we should prevent the host from >> generating interrupts. > > I guess that means adding a new callback. Or we could just suspend the > controller, like Evan proposed originally Suspending the host in freeze should work. It will do an extra xhci controller state save stage, but that should be harmless. But is there really a need for the suggested noirq part? + .freeze_noirq = hcd_pci_suspend_noirq, That will try to set the host to PCI D3 state. It seems a bit unnecessary for freeze. > >>> (If the root hub _is_ enabled for wakeup then it's questionable. >>> Unplugging a device would be a wakeup event, so you could easily argue >>> that it _should_ prevent the system from going into hibernation. After >>> all, if the unplug happened a few milliseconds later, after the system >>> had fully gone into hibernation, then it would cause the system to wake >>> up.) >>> >>>> Would it make sense prevent xHCI interrupt generation in the host >>>> freeze() stage, clearing the xHCI EINT bit in addition to calling >>>> check_roothub_suspend()? >>>> Then enable it back in thaw() >>> >>> That won't fully eliminate the problem mentioned in the preceding >>> paragraphs, although I guess it would help somewhat. >> >> Would the following steps solve this? >> >> 1. Disable device initiated resume for connected usb devices in freeze() >> >> 2. Don't propagate connect or OC changes if roothub is suspended and port wake >> flags are disabled. I.E don't kick roothub polling in xhci interrupt >> handler here. > > I guess you can't just halt the entire host controller when only one of > the root hubs is suspended with wakeup disabled. That does complicate > things. But you could halt it as soon as both of the root hubs are > frozen. Wouldn't that prevent interrupt generation? True, but probably easier to just suspend host in freeze() as you stated above. -Mathias
On Thu, Apr 14, 2022 at 05:00:12PM +0300, Mathias Nyman wrote: > On 12.4.2022 18.40, Alan Stern wrote: > > On Tue, Apr 12, 2022 at 05:56:42PM +0300, Mathias Nyman wrote: > >> On 11.4.2022 17.50, Alan Stern wrote: > >>> For example, what would happen if the user unplugs a device right in the > >>> middle of the freeze transition, after the root hub has been frozen but > >>> before the controller is frozen? We don't want such an unplug event to > >>> prevent the system from going into hibernation -- especially if the root > >>> hub was not enabled for wakeup. > >> > >> We should be able to let system go to hibernate even if we get a disconnect > >> interrupt between roothub and host controller freeze. > >> Host is not yet suspended so no PME# wake is generated, only an interrupt. > >> > >> From Linux PM point of view it should be ok as well as the actual xhci > >> device that is generating the interrupt is hasnt completer freeze() > >> > >> The xhci interrupt handler just needs to make sure that the disconnect > >> isn't propagated if roothub is suspended and wake on disconnect > >> is not set. And definitely make sure xhci doesn't start roothub polling. > >> > >> When freeze() is called for the host we should prevent the host from > >> generating interrupts. > > > > I guess that means adding a new callback. Or we could just suspend the > > controller, like Evan proposed originally > > Suspending the host in freeze should work. > It will do an extra xhci controller state save stage, but that should be harmless. > > But is there really a need for the suggested noirq part? > > + .freeze_noirq = hcd_pci_suspend_noirq, > > That will try to set the host to PCI D3 state. > It seems a bit unnecessary for freeze. Agreed. > >>> (If the root hub _is_ enabled for wakeup then it's questionable. > >>> Unplugging a device would be a wakeup event, so you could easily argue > >>> that it _should_ prevent the system from going into hibernation. After > >>> all, if the unplug happened a few milliseconds later, after the system > >>> had fully gone into hibernation, then it would cause the system to wake > >>> up.) > >>> > >>>> Would it make sense prevent xHCI interrupt generation in the host > >>>> freeze() stage, clearing the xHCI EINT bit in addition to calling > >>>> check_roothub_suspend()? > >>>> Then enable it back in thaw() > >>> > >>> That won't fully eliminate the problem mentioned in the preceding > >>> paragraphs, although I guess it would help somewhat. > >> > >> Would the following steps solve this? > >> > >> 1. Disable device initiated resume for connected usb devices in freeze() > >> > >> 2. Don't propagate connect or OC changes if roothub is suspended and port wake > >> flags are disabled. I.E don't kick roothub polling in xhci interrupt > >> handler here. > > > > I guess you can't just halt the entire host controller when only one of > > the root hubs is suspended with wakeup disabled. That does complicate > > things. But you could halt it as soon as both of the root hubs are > > frozen. Wouldn't that prevent interrupt generation? > > True, but probably easier to just suspend host in freeze() as you stated above. Okay. Evan, this discussion suggests that you rewrite your patch as a series of three: 1. Change choose_wakeup() so that for PM_EVENT_FREEZE, wakeup is always disabled. 2. Change the xhci-hcd interrupt handler so that port-status changes are ignored if the port's root hub is suspended with wakeup disabled. 3. As in the original patch, make the .freeze and .thaw callbacks in hcd-pci.c call the appropriate suspend and resume routines, but don't do anything for .freeze_noirq and .thaw_noirq. How does that sound? Alan Stern
Hi Alan and Mathias, On Thu, Apr 14, 2022 at 7:21 AM Alan Stern <stern@rowland.harvard.edu> wrote: > > On Thu, Apr 14, 2022 at 05:00:12PM +0300, Mathias Nyman wrote: > > On 12.4.2022 18.40, Alan Stern wrote: > > > On Tue, Apr 12, 2022 at 05:56:42PM +0300, Mathias Nyman wrote: > > >> On 11.4.2022 17.50, Alan Stern wrote: > > >>> For example, what would happen if the user unplugs a device right in the > > >>> middle of the freeze transition, after the root hub has been frozen but > > >>> before the controller is frozen? We don't want such an unplug event to > > >>> prevent the system from going into hibernation -- especially if the root > > >>> hub was not enabled for wakeup. > > >> > > >> We should be able to let system go to hibernate even if we get a disconnect > > >> interrupt between roothub and host controller freeze. > > >> Host is not yet suspended so no PME# wake is generated, only an interrupt. > > >> > > >> From Linux PM point of view it should be ok as well as the actual xhci > > >> device that is generating the interrupt is hasnt completer freeze() > > >> > > >> The xhci interrupt handler just needs to make sure that the disconnect > > >> isn't propagated if roothub is suspended and wake on disconnect > > >> is not set. And definitely make sure xhci doesn't start roothub polling. > > >> > > >> When freeze() is called for the host we should prevent the host from > > >> generating interrupts. > > > > > > I guess that means adding a new callback. Or we could just suspend the > > > controller, like Evan proposed originally > > > > Suspending the host in freeze should work. > > It will do an extra xhci controller state save stage, but that should be harmless. > > > > But is there really a need for the suggested noirq part? > > > > + .freeze_noirq = hcd_pci_suspend_noirq, > > > > That will try to set the host to PCI D3 state. > > It seems a bit unnecessary for freeze. > > Agreed. > > > >>> (If the root hub _is_ enabled for wakeup then it's questionable. > > >>> Unplugging a device would be a wakeup event, so you could easily argue > > >>> that it _should_ prevent the system from going into hibernation. After > > >>> all, if the unplug happened a few milliseconds later, after the system > > >>> had fully gone into hibernation, then it would cause the system to wake > > >>> up.) > > >>> > > >>>> Would it make sense prevent xHCI interrupt generation in the host > > >>>> freeze() stage, clearing the xHCI EINT bit in addition to calling > > >>>> check_roothub_suspend()? > > >>>> Then enable it back in thaw() > > >>> > > >>> That won't fully eliminate the problem mentioned in the preceding > > >>> paragraphs, although I guess it would help somewhat. > > >> > > >> Would the following steps solve this? > > >> > > >> 1. Disable device initiated resume for connected usb devices in freeze() > > >> > > >> 2. Don't propagate connect or OC changes if roothub is suspended and port wake > > >> flags are disabled. I.E don't kick roothub polling in xhci interrupt > > >> handler here. > > > > > > I guess you can't just halt the entire host controller when only one of > > > the root hubs is suspended with wakeup disabled. That does complicate > > > things. But you could halt it as soon as both of the root hubs are > > > frozen. Wouldn't that prevent interrupt generation? > > > > True, but probably easier to just suspend host in freeze() as you stated above. > > Okay. > > Evan, this discussion suggests that you rewrite your patch as a series > of three: > > 1. Change choose_wakeup() so that for PM_EVENT_FREEZE, wakeup is > always disabled. If I understand this correctly, this means potentially runtime resuming the device so its wakeup setting can be consistently set to wakeups disabled across a freeze transition. Got it I think in terms of the "how". > > 2. Change the xhci-hcd interrupt handler so that port-status > changes are ignored if the port's root hub is suspended with > wakeup disabled. This part confuses me. This would be way deep under xhci_handle_event(), probably in handle_port_status(), just throwing away certain events that come in the ring. How would we know to go back and process those events later? I think we don't need to do this if we suspend the controller as in #3 below. The suspended (halted) controller wouldn't generate event interrupts (since the spec mentions port status change generation is gated on HCHalted). So we're already covered against receiving interrupts in this zone by halting the controller, and the events stay nicely pending for when we restart it in thaw. Is the goal of #1 purely a setup change for #2, or does it stand on its own even if we nixed #2? Said differently, is #1 trying to ensure that wake signaling doesn't occur at all between freeze and thaw, even when the controller is suspended and guaranteed not to generate interrupts via its "normal" mechanism? I don't have a crisp mental picture of how the wake signaling works, but if the controller wake mechanism sidesteps the original problem of sending an MSI to a dead CPU (as in, it does not use MSIs), then it might be ok as-is. > > 3. As in the original patch, make the .freeze and .thaw callbacks > in hcd-pci.c call the appropriate suspend and resume routines, > but don't do anything for .freeze_noirq and .thaw_noirq. Sure. I had made the _noirq paths match suspend for consistency, I wasn't sure if those could mix n match without issues. I'll try it out leaving the _noirq callbacks alone. -Evan > > How does that sound? > > Alan Stern
On 14.4.2022 19.30, Evan Green wrote: > Hi Alan and Mathias, > > On Thu, Apr 14, 2022 at 7:21 AM Alan Stern <stern@rowland.harvard.edu> wrote: >> >> On Thu, Apr 14, 2022 at 05:00:12PM +0300, Mathias Nyman wrote: >>> On 12.4.2022 18.40, Alan Stern wrote: >>>> On Tue, Apr 12, 2022 at 05:56:42PM +0300, Mathias Nyman wrote: >>>>> On 11.4.2022 17.50, Alan Stern wrote: >>>>>> For example, what would happen if the user unplugs a device right in the >>>>>> middle of the freeze transition, after the root hub has been frozen but >>>>>> before the controller is frozen? We don't want such an unplug event to >>>>>> prevent the system from going into hibernation -- especially if the root >>>>>> hub was not enabled for wakeup. >>>>> >>>>> We should be able to let system go to hibernate even if we get a disconnect >>>>> interrupt between roothub and host controller freeze. >>>>> Host is not yet suspended so no PME# wake is generated, only an interrupt. >>>>> >>>>> From Linux PM point of view it should be ok as well as the actual xhci >>>>> device that is generating the interrupt is hasnt completer freeze() >>>>> >>>>> The xhci interrupt handler just needs to make sure that the disconnect >>>>> isn't propagated if roothub is suspended and wake on disconnect >>>>> is not set. And definitely make sure xhci doesn't start roothub polling. >>>>> >>>>> When freeze() is called for the host we should prevent the host from >>>>> generating interrupts. >>>> >>>> I guess that means adding a new callback. Or we could just suspend the >>>> controller, like Evan proposed originally >>> >>> Suspending the host in freeze should work. >>> It will do an extra xhci controller state save stage, but that should be harmless. >>> >>> But is there really a need for the suggested noirq part? >>> >>> + .freeze_noirq = hcd_pci_suspend_noirq, >>> >>> That will try to set the host to PCI D3 state. >>> It seems a bit unnecessary for freeze. >> >> Agreed. >> >>>>>> (If the root hub _is_ enabled for wakeup then it's questionable. >>>>>> Unplugging a device would be a wakeup event, so you could easily argue >>>>>> that it _should_ prevent the system from going into hibernation. After >>>>>> all, if the unplug happened a few milliseconds later, after the system >>>>>> had fully gone into hibernation, then it would cause the system to wake >>>>>> up.) >>>>>> >>>>>>> Would it make sense prevent xHCI interrupt generation in the host >>>>>>> freeze() stage, clearing the xHCI EINT bit in addition to calling >>>>>>> check_roothub_suspend()? >>>>>>> Then enable it back in thaw() >>>>>> >>>>>> That won't fully eliminate the problem mentioned in the preceding >>>>>> paragraphs, although I guess it would help somewhat. >>>>> >>>>> Would the following steps solve this? >>>>> >>>>> 1. Disable device initiated resume for connected usb devices in freeze() >>>>> >>>>> 2. Don't propagate connect or OC changes if roothub is suspended and port wake >>>>> flags are disabled. I.E don't kick roothub polling in xhci interrupt >>>>> handler here. >>>> >>>> I guess you can't just halt the entire host controller when only one of >>>> the root hubs is suspended with wakeup disabled. That does complicate >>>> things. But you could halt it as soon as both of the root hubs are >>>> frozen. Wouldn't that prevent interrupt generation? >>> >>> True, but probably easier to just suspend host in freeze() as you stated above. >> >> Okay. >> >> Evan, this discussion suggests that you rewrite your patch as a series >> of three: >> >> 1. Change choose_wakeup() so that for PM_EVENT_FREEZE, wakeup is >> always disabled. > > If I understand this correctly, this means potentially runtime > resuming the device so its wakeup setting can be consistently set to > wakeups disabled across a freeze transition. Got it I think in terms > of the "how". > >> >> 2. Change the xhci-hcd interrupt handler so that port-status >> changes are ignored if the port's root hub is suspended with >> wakeup disabled. > > This part confuses me. This would be way deep under > xhci_handle_event(), probably in handle_port_status(), just throwing > away certain events that come in the ring. How would we know to go > back and process those events later? I think we don't need to do this > if we suspend the controller as in #3 below. The suspended (halted) > controller wouldn't generate event interrupts (since the spec mentions > port status change generation is gated on HCHalted). So we're already > covered against receiving interrupts in this zone by halting the > controller, and the events stay nicely pending for when we restart it > in thaw. Was thinking the same here. It would be nice to have this to comply with usb spec, keeping roothub from propagating connect/disconnect events immediately after suspending it with wake flags cleared. But it's a lot of work to implement this, and for this issue, and linux hibernate point of view I don't think it has any real benefit. The actual device generating the interrupt is the host (parent of roothub), and that will stop once freeze() is called for it in #3 > > Is the goal of #1 purely a setup change for #2, or does it stand on > its own even if we nixed #2? Said differently, is #1 trying to ensure > that wake signaling doesn't occur at all between freeze and thaw, even > when the controller is suspended and guaranteed not to generate > interrupts via its "normal" mechanism? I don't have a crisp mental > picture of how the wake signaling works, but if the controller wake > mechanism sidesteps the original problem of sending an MSI to a dead > CPU (as in, it does not use MSIs), then it might be ok as-is. #1 is needed because xHCI can generate wake events even when halted if device initiated resume signaling is detected on a roothub port. Just like it can generate wake events on connect/disconnect if wake flags are set. (xhci spec figure 4-34, see PLS=Resume) We want to avoid those wakeups between freeze-thaw So just #1 and #3 should probably solve this, and be an easier change. -Mathias > >> >> 3. As in the original patch, make the .freeze and .thaw callbacks >> in hcd-pci.c call the appropriate suspend and resume routines, >> but don't do anything for .freeze_noirq and .thaw_noirq. > > Sure. I had made the _noirq paths match suspend for consistency, I > wasn't sure if those could mix n match without issues. I'll try it out > leaving the _noirq callbacks alone. > -Evan > >> >> How does that sound? >> >> Alan Stern
On Thu, Apr 14, 2022 at 08:06:32PM +0300, Mathias Nyman wrote: > On 14.4.2022 19.30, Evan Green wrote: > > Hi Alan and Mathias, > > > > On Thu, Apr 14, 2022 at 7:21 AM Alan Stern <stern@rowland.harvard.edu> wrote: > >> Evan, this discussion suggests that you rewrite your patch as a series > >> of three: > >> > >> 1. Change choose_wakeup() so that for PM_EVENT_FREEZE, wakeup is > >> always disabled. > > > > If I understand this correctly, this means potentially runtime > > resuming the device so its wakeup setting can be consistently set to > > wakeups disabled across a freeze transition. The kernel already does this for you. All you have to do is change the routine so that it always decides that wakeup should be off for FREEZE. > > Got it I think in terms > > of the "how". > >> 2. Change the xhci-hcd interrupt handler so that port-status > >> changes are ignored if the port's root hub is suspended with > >> wakeup disabled. > > > > This part confuses me. This would be way deep under > > xhci_handle_event(), probably in handle_port_status(), just throwing > > away certain events that come in the ring. How would we know to go > > back and process those events later? We wouldn't. There's no need to process the events later. When a hub (including a root hub) is resumed, the hub driver checks the state of each port and takes whatever actions are needed to handle any changes that occurred while the hub was suspended. In fact, processing these events wouldnn't really accomplish very much in any case. The driver would request that the root hub be resumed, that request would be submitted to a work queue, and then nothing would happen because the work queue, like many other kernel threads, gets frozen during a hibernation transition. All that's really needed is to guarantee that the root hub would be resumed when we leave hibernation. And of course, this always happens regardless of what events were received in the meantime. > > I think we don't need to do this > > if we suspend the controller as in #3 below. The suspended (halted) > > controller wouldn't generate event interrupts (since the spec mentions > > port status change generation is gated on HCHalted). So we're already > > covered against receiving interrupts in this zone by halting the > > controller, and the events stay nicely pending for when we restart it > > in thaw. > > Was thinking the same here. It would be nice to have this to comply with > usb spec, keeping roothub from propagating connect/disconnect events > immediately after suspending it with wake flags cleared. > > But it's a lot of work to implement this, and for this issue, and linux > hibernate point of view I don't think it has any real benefit. > The actual device generating the interrupt is the host (parent of roothub), > and that will stop once freeze() is called for it in #3 The only reason that approach works is because we never disable resume requests during runtime suspend. But okay... > > Is the goal of #1 purely a setup change for #2, or does it stand on > > its own even if we nixed #2? Said differently, is #1 trying to ensure > > that wake signaling doesn't occur at all between freeze and thaw, even > > when the controller is suspended and guaranteed not to generate > > interrupts via its "normal" mechanism? I don't have a crisp mental > > picture of how the wake signaling works, but if the controller wake > > mechanism sidesteps the original problem of sending an MSI to a dead > > CPU (as in, it does not use MSIs), then it might be ok as-is. > > #1 is needed because xHCI can generate wake events even when halted if > device initiated resume signaling is detected on a roothub port. > Just like it can generate wake events on connect/disconnect if wake flags > are set. (xhci spec figure 4-34, see PLS=Resume) > We want to avoid those wakeups between freeze-thaw Think of it this way: All USB hubs, including root hubs, always relay a resume request upstream when one is received on a downstream port, no matter what their wakeup setting is. A hub's wakeup setting only controls whether it generates a resume request on its own in response to a port-status change. > So just #1 and #3 should probably solve this, and be an easier change. Let's try it and see. Alan Stern
diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c index 8176bc81a635d6..e02506807ffc6c 100644 --- a/drivers/usb/core/hcd-pci.c +++ b/drivers/usb/core/hcd-pci.c @@ -616,10 +616,10 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { .suspend_noirq = hcd_pci_suspend_noirq, .resume_noirq = hcd_pci_resume_noirq, .resume = hcd_pci_resume, - .freeze = check_root_hub_suspended, - .freeze_noirq = check_root_hub_suspended, - .thaw_noirq = NULL, - .thaw = NULL, + .freeze = hcd_pci_suspend, + .freeze_noirq = hcd_pci_suspend_noirq, + .thaw_noirq = hcd_pci_resume_noirq, + .thaw = hcd_pci_resume, .poweroff = hcd_pci_suspend, .poweroff_noirq = hcd_pci_suspend_noirq, .restore_noirq = hcd_pci_resume_noirq,
The documentation for the freeze() method says that it "should quiesce the device so that it doesn't generate IRQs or DMA". The unspoken consequence of not doing this is that MSIs aimed at non-boot CPUs may get fully lost if they're sent during the period where the target CPU is offline. The current callbacks for USB HCD do not fully quiesce interrupts, specifically on XHCI. Change to use the full suspend/resume flow for freeze/thaw to ensure interrupts are fully quiesced. This fixes issues where USB devices fail to thaw during hibernation because XHCI misses its interrupt and fails to recover. Signed-off-by: Evan Green <evgreen@chromium.org> --- You may be able to reproduce this issue on your own machine via the following: 1. Disable runtime PM on your XHCI controller 2. Aim your XHCI IRQ at a non-boot CPU (replace 174): echo 2 > /proc/irq/174/smp_affinity 3. Attempt to hibernate (no need to actually go all the way down). I run 2 and 3 in a loop, and can usually hit a hang or dead XHCI controller within 1-2 iterations. I happened to notice this on an Alderlake system where runtime PM is accidentally disabled for one of the XHCI controllers. Some more discussion and debugging can be found at [1]. [1] https://lore.kernel.org/linux-pci/CAE=gft4a-QL82iFJE_xRQ3JrMmz-KZKWREtz=MghhjFbJeK=8A@mail.gmail.com/T/#u --- drivers/usb/core/hcd-pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)