diff mbox series

[05/10] PCI: portdrv: Resume upon exit from system suspend if left runtime suspended

Message ID 20180906155020.51700-6-mika.westerberg@linux.intel.com (mailing list archive)
State Not Applicable
Headers show
Series PCI: Allow D3cold for PCIe hierarchies | expand

Commit Message

Mika Westerberg Sept. 6, 2018, 3:50 p.m. UTC
Currently we try to keep PCIe ports runtime suspended over system
suspend if possible. This mostly happens when entering suspend-to-idle
because there is no need to re-configure wake settings.

This causes problems if the parent port goes into D3cold and it gets
resumed upon exit from system suspend. This may happen for example if
the port is part of PCIe switch and the same switch is connected to a
PCIe endpoint that needs to be resumed. The way exit from D3cold works
according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
reset is signaled. After this the device is in D0unitialized state
keeping PME context if it supports wake from D3cold.

The problem occurs when a PCIe hotplug port is left suspended and the
parent port goes into D3cold and back to D0, the port keeps its PME
context but since everything else is reset back to defaults
(D0unitialized) it is not set to detect hotplug events anymore.

For this reason change the PCIe portdrv power management logic so that
it is fine to keep the port runtime suspended over system suspend but it
needs to be resumed upon exit to make sure it gets properly re-initialized.
The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
because otherwise pci_pm_prepare() instructs the PM core to go directly
to pci_pm_complete() on resume and this skips resuming the port.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
---
 drivers/pci/pcie/portdrv_pci.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Sept. 11, 2018, 8 a.m. UTC | #1
On Thu, Sep 6, 2018 at 5:50 PM Mika Westerberg
<mika.westerberg@linux.intel.com> wrote:
>
> Currently we try to keep PCIe ports runtime suspended over system
> suspend if possible. This mostly happens when entering suspend-to-idle
> because there is no need to re-configure wake settings.
>
> This causes problems if the parent port goes into D3cold and it gets
> resumed upon exit from system suspend. This may happen for example if
> the port is part of PCIe switch and the same switch is connected to a
> PCIe endpoint that needs to be resumed. The way exit from D3cold works
> according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> reset is signaled. After this the device is in D0unitialized state
> keeping PME context if it supports wake from D3cold.
>
> The problem occurs when a PCIe hotplug port is left suspended and the
> parent port goes into D3cold and back to D0, the port keeps its PME
> context but since everything else is reset back to defaults
> (D0unitialized) it is not set to detect hotplug events anymore.
>
> For this reason change the PCIe portdrv power management logic so that
> it is fine to keep the port runtime suspended over system suspend but it
> needs to be resumed upon exit to make sure it gets properly re-initialized.
> The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> because otherwise pci_pm_prepare() instructs the PM core to go directly
> to pci_pm_complete() on resume and this skips resuming the port.

Thanks for the detailed explanation, it helps quite a bit!

> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> ---
>  drivers/pci/pcie/portdrv_pci.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index eef22dc29140..74761f660a30 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -43,6 +43,21 @@ __setup("pcie_ports=", pcie_port_setup);
>  /* global data */
>
>  #ifdef CONFIG_PM
> +int pcie_port_prepare(struct device *dev)
> +{
> +       /*
> +        * Return 0 here to indicate PCI core that:
> +        *   - Direct complete path should be avoided
> +        *   - It is OK to leave the port runtime suspended over system
> +        *     suspend
> +        *
> +        * However, the port needs to be resumed afterwards because it may
> +        * have been in D3cold in which case we need to re-initialize the
> +        * hardware as it is in D0uninitialized in that case.
> +        */
> +       return 0;
> +}

You wouldn't need this if you passed DPM_FLAG_NEVER_SKIP to
dev_pm_set_driver_flags() (instead of DPM_FLAG_SMART_SUSPEND), would
you?

> +
>  static int pcie_port_runtime_suspend(struct device *dev)
>  {
>         return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
> @@ -64,6 +79,7 @@ static int pcie_port_runtime_idle(struct device *dev)
>  }
>
>  static const struct dev_pm_ops pcie_portdrv_pm_ops = {
> +       .prepare        = pcie_port_prepare,
>         .suspend        = pcie_port_device_suspend,
>         .resume_noirq   = pcie_port_device_resume_noirq,
>         .resume         = pcie_port_device_resume,
> @@ -109,8 +125,8 @@ static int pcie_portdrv_probe(struct pci_dev *dev,
>
>         pci_save_state(dev);
>
> -       dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_SMART_SUSPEND |
> -                                          DPM_FLAG_LEAVE_SUSPENDED);
> +       dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_SMART_PREPARE |
> +                                          DPM_FLAG_SMART_SUSPEND);
>
>         if (pci_bridge_d3_possible(dev)) {
>                 /*
> --
> 2.18.0
>
Lukas Wunner Sept. 11, 2018, 8:29 a.m. UTC | #2
On Thu, Sep 06, 2018 at 06:50:15PM +0300, Mika Westerberg wrote:
> Currently we try to keep PCIe ports runtime suspended over system
> suspend if possible. This mostly happens when entering suspend-to-idle
> because there is no need to re-configure wake settings.
> 
> This causes problems if the parent port goes into D3cold and it gets
> resumed upon exit from system suspend. This may happen for example if
> the port is part of PCIe switch and the same switch is connected to a
> PCIe endpoint that needs to be resumed. The way exit from D3cold works
> according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> reset is signaled. After this the device is in D0unitialized state
> keeping PME context if it supports wake from D3cold.
> 
> The problem occurs when a PCIe hotplug port is left suspended and the
> parent port goes into D3cold and back to D0, the port keeps its PME
> context but since everything else is reset back to defaults
> (D0unitialized) it is not set to detect hotplug events anymore.

We call pci_wakeup_bus() in __pci_start_power_transition() for this
reason.  Why isn't that sufficient in your use case?


> For this reason change the PCIe portdrv power management logic so that
> it is fine to keep the port runtime suspended over system suspend but it
> needs to be resumed upon exit to make sure it gets properly re-initialized.
> The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> because otherwise pci_pm_prepare() instructs the PM core to go directly
> to pci_pm_complete() on resume and this skips resuming the port.

On Macs, if no Thunderbolt device is attached, it is perfectly okay to
use direct complete and it is also perfectly okay to leave the entire
controller (including all its PCIe ports) in D3cold when coming out of
system sleep.  In fact it would be unnecessary and undesirable to
runtime resume the controller, it would just waste energy for no reason.

Can you make sure that ports stay runtime suspended after system sleep
in that case?

Thanks,

Lukas
Mika Westerberg Sept. 11, 2018, 9:08 a.m. UTC | #3
On Tue, Sep 11, 2018 at 10:29:35AM +0200, Lukas Wunner wrote:
> On Thu, Sep 06, 2018 at 06:50:15PM +0300, Mika Westerberg wrote:
> > Currently we try to keep PCIe ports runtime suspended over system
> > suspend if possible. This mostly happens when entering suspend-to-idle
> > because there is no need to re-configure wake settings.
> > 
> > This causes problems if the parent port goes into D3cold and it gets
> > resumed upon exit from system suspend. This may happen for example if
> > the port is part of PCIe switch and the same switch is connected to a
> > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > reset is signaled. After this the device is in D0unitialized state
> > keeping PME context if it supports wake from D3cold.
> > 
> > The problem occurs when a PCIe hotplug port is left suspended and the
> > parent port goes into D3cold and back to D0, the port keeps its PME
> > context but since everything else is reset back to defaults
> > (D0unitialized) it is not set to detect hotplug events anymore.
> 
> We call pci_wakeup_bus() in __pci_start_power_transition() for this
> reason.  Why isn't that sufficient in your use case?

It would otherwise but __pci_start_power_transition() is never called
because the bridge is left suspended.

> > For this reason change the PCIe portdrv power management logic so that
> > it is fine to keep the port runtime suspended over system suspend but it
> > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > to pci_pm_complete() on resume and this skips resuming the port.
> 
> On Macs, if no Thunderbolt device is attached, it is perfectly okay to
> use direct complete and it is also perfectly okay to leave the entire
> controller (including all its PCIe ports) in D3cold when coming out of
> system sleep.  In fact it would be unnecessary and undesirable to
> runtime resume the controller, it would just waste energy for no reason.

I'm surprised if it works like that because both PCIe spec and
conventional PCI spec both require reset after D3cold and the only state
for a function after that is D0uninitialized.
Mika Westerberg Sept. 11, 2018, 9:15 a.m. UTC | #4
On Tue, Sep 11, 2018 at 10:00:07AM +0200, Rafael J. Wysocki wrote:
> On Thu, Sep 6, 2018 at 5:50 PM Mika Westerberg
> <mika.westerberg@linux.intel.com> wrote:
> >
> > Currently we try to keep PCIe ports runtime suspended over system
> > suspend if possible. This mostly happens when entering suspend-to-idle
> > because there is no need to re-configure wake settings.
> >
> > This causes problems if the parent port goes into D3cold and it gets
> > resumed upon exit from system suspend. This may happen for example if
> > the port is part of PCIe switch and the same switch is connected to a
> > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > reset is signaled. After this the device is in D0unitialized state
> > keeping PME context if it supports wake from D3cold.
> >
> > The problem occurs when a PCIe hotplug port is left suspended and the
> > parent port goes into D3cold and back to D0, the port keeps its PME
> > context but since everything else is reset back to defaults
> > (D0unitialized) it is not set to detect hotplug events anymore.
> >
> > For this reason change the PCIe portdrv power management logic so that
> > it is fine to keep the port runtime suspended over system suspend but it
> > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > to pci_pm_complete() on resume and this skips resuming the port.
> 
> Thanks for the detailed explanation, it helps quite a bit!
> 
> > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > ---
> >  drivers/pci/pcie/portdrv_pci.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index eef22dc29140..74761f660a30 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -43,6 +43,21 @@ __setup("pcie_ports=", pcie_port_setup);
> >  /* global data */
> >
> >  #ifdef CONFIG_PM
> > +int pcie_port_prepare(struct device *dev)
> > +{
> > +       /*
> > +        * Return 0 here to indicate PCI core that:
> > +        *   - Direct complete path should be avoided
> > +        *   - It is OK to leave the port runtime suspended over system
> > +        *     suspend
> > +        *
> > +        * However, the port needs to be resumed afterwards because it may
> > +        * have been in D3cold in which case we need to re-initialize the
> > +        * hardware as it is in D0uninitialized in that case.
> > +        */
> > +       return 0;
> > +}
> 
> You wouldn't need this if you passed DPM_FLAG_NEVER_SKIP to
> dev_pm_set_driver_flags() (instead of DPM_FLAG_SMART_SUSPEND), would
> you?

Yes, I think it would work. I did not use it here because I thought it
would be better to leave the port runtime suspended during system
suspend (to idle) but in both cases we end up resuming them at some
point (before or after system suspend).

DPM_FLAG_NEVER_SKIP would result a cleaner patch, I think.
Lukas Wunner Sept. 11, 2018, 9:26 a.m. UTC | #5
On Tue, Sep 11, 2018 at 12:08:17PM +0300, Mika Westerberg wrote:
> On Tue, Sep 11, 2018 at 10:29:35AM +0200, Lukas Wunner wrote:
> > On Thu, Sep 06, 2018 at 06:50:15PM +0300, Mika Westerberg wrote:
> > > Currently we try to keep PCIe ports runtime suspended over system
> > > suspend if possible. This mostly happens when entering suspend-to-idle
> > > because there is no need to re-configure wake settings.
> > > 
> > > This causes problems if the parent port goes into D3cold and it gets
> > > resumed upon exit from system suspend. This may happen for example if
> > > the port is part of PCIe switch and the same switch is connected to a
> > > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > > reset is signaled. After this the device is in D0unitialized state
> > > keeping PME context if it supports wake from D3cold.
> > > 
> > > The problem occurs when a PCIe hotplug port is left suspended and the
> > > parent port goes into D3cold and back to D0, the port keeps its PME
> > > context but since everything else is reset back to defaults
> > > (D0unitialized) it is not set to detect hotplug events anymore.
> > 
> > We call pci_wakeup_bus() in __pci_start_power_transition() for this
> > reason.  Why isn't that sufficient in your use case?
> 
> It would otherwise but __pci_start_power_transition() is never called
> because the bridge is left suspended.

You write above that the parent goes to D0, now you say it is left
suspended.  Which one is it?


> > > For this reason change the PCIe portdrv power management logic so that
> > > it is fine to keep the port runtime suspended over system suspend but it
> > > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > > to pci_pm_complete() on resume and this skips resuming the port.
> > 
> > On Macs, if no Thunderbolt device is attached, it is perfectly okay to
> > use direct complete and it is also perfectly okay to leave the entire
> > controller (including all its PCIe ports) in D3cold when coming out of
> > system sleep.  In fact it would be unnecessary and undesirable to
> > runtime resume the controller, it would just waste energy for no reason.
> 
> I'm surprised if it works like that because both PCIe spec and
> conventional PCI spec both require reset after D3cold and the only state
> for a function after that is D0uninitialized.

The controller is in D3cold when coming out of system sleep if it was in
D3cold when sytem sleep commenced.

Thanks,

Lukas
Mika Westerberg Sept. 11, 2018, 9:41 a.m. UTC | #6
On Tue, Sep 11, 2018 at 11:26:30AM +0200, Lukas Wunner wrote:
> On Tue, Sep 11, 2018 at 12:08:17PM +0300, Mika Westerberg wrote:
> > On Tue, Sep 11, 2018 at 10:29:35AM +0200, Lukas Wunner wrote:
> > > On Thu, Sep 06, 2018 at 06:50:15PM +0300, Mika Westerberg wrote:
> > > > Currently we try to keep PCIe ports runtime suspended over system
> > > > suspend if possible. This mostly happens when entering suspend-to-idle
> > > > because there is no need to re-configure wake settings.
> > > > 
> > > > This causes problems if the parent port goes into D3cold and it gets
> > > > resumed upon exit from system suspend. This may happen for example if
> > > > the port is part of PCIe switch and the same switch is connected to a
> > > > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > > > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > > > reset is signaled. After this the device is in D0unitialized state
> > > > keeping PME context if it supports wake from D3cold.
> > > > 
> > > > The problem occurs when a PCIe hotplug port is left suspended and the
> > > > parent port goes into D3cold and back to D0, the port keeps its PME
> > > > context but since everything else is reset back to defaults
> > > > (D0unitialized) it is not set to detect hotplug events anymore.
> > > 
> > > We call pci_wakeup_bus() in __pci_start_power_transition() for this
> > > reason.  Why isn't that sufficient in your use case?
> > 
> > It would otherwise but __pci_start_power_transition() is never called
> > because the bridge is left suspended.
> 
> You write above that the parent goes to D0, now you say it is left
> suspended.  Which one is it?

The port below the parent is left suspended -- the hotplug port.

Once the upstream port of a PCIe switch is resumed to D0 it will be
reset and that reset is propagated to other ports in that switch so it
makes the hotplug port go to D0uninitialized as well.

> > > > For this reason change the PCIe portdrv power management logic so that
> > > > it is fine to keep the port runtime suspended over system suspend but it
> > > > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > > > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > > > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > > > to pci_pm_complete() on resume and this skips resuming the port.
> > > 
> > > On Macs, if no Thunderbolt device is attached, it is perfectly okay to
> > > use direct complete and it is also perfectly okay to leave the entire
> > > controller (including all its PCIe ports) in D3cold when coming out of
> > > system sleep.  In fact it would be unnecessary and undesirable to
> > > runtime resume the controller, it would just waste energy for no reason.
> > 
> > I'm surprised if it works like that because both PCIe spec and
> > conventional PCI spec both require reset after D3cold and the only state
> > for a function after that is D0uninitialized.
> 
> The controller is in D3cold when coming out of system sleep if it was in
> D3cold when sytem sleep commenced.

In that case if the whole chain is left in D3cold over system sleep
there is no issue. But at least with the Thunderbolt controllers I've
been dealing with, the xHCI (or the USB stack) that is part of the PCIe
switch wants to resume the controller and that also resumes the upstream
port and the root port which leads to this situation. I did not check
Apple systems, though but I thought they also include xHCI.

I guess the point is if there is any device in the hierarchy that needs
to be resumed, we end up in this situation.
Lukas Wunner Sept. 11, 2018, 9:53 a.m. UTC | #7
On Tue, Sep 11, 2018 at 12:41:44PM +0300, Mika Westerberg wrote:
> On Tue, Sep 11, 2018 at 11:26:30AM +0200, Lukas Wunner wrote:
> > On Tue, Sep 11, 2018 at 12:08:17PM +0300, Mika Westerberg wrote:
> > > On Tue, Sep 11, 2018 at 10:29:35AM +0200, Lukas Wunner wrote:
> > > > On Thu, Sep 06, 2018 at 06:50:15PM +0300, Mika Westerberg wrote:
> > > > > Currently we try to keep PCIe ports runtime suspended over system
> > > > > suspend if possible. This mostly happens when entering suspend-to-idle
> > > > > because there is no need to re-configure wake settings.
> > > > > 
> > > > > This causes problems if the parent port goes into D3cold and it gets
> > > > > resumed upon exit from system suspend. This may happen for example if
> > > > > the port is part of PCIe switch and the same switch is connected to a
> > > > > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > > > > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > > > > reset is signaled. After this the device is in D0unitialized state
> > > > > keeping PME context if it supports wake from D3cold.
> > > > > 
> > > > > The problem occurs when a PCIe hotplug port is left suspended and the
> > > > > parent port goes into D3cold and back to D0, the port keeps its PME
> > > > > context but since everything else is reset back to defaults
> > > > > (D0unitialized) it is not set to detect hotplug events anymore.
> > > > 
> > > > We call pci_wakeup_bus() in __pci_start_power_transition() for this
> > > > reason.  Why isn't that sufficient in your use case?
> > > 
> > > It would otherwise but __pci_start_power_transition() is never called
> > > because the bridge is left suspended.
> > 
> > You write above that the parent goes to D0, now you say it is left
> > suspended.  Which one is it?
> 
> The port below the parent is left suspended -- the hotplug port.
> 
> Once the upstream port of a PCIe switch is resumed to D0 it will be
> reset and that reset is propagated to other ports in that switch so it
> makes the hotplug port go to D0uninitialized as well.

Yes, but as said when the PCI core runtime resumes the upstream port to D0,
__pci_start_power_transition() should call pci_wakeup_bus() to wake all
devices on its subordinate bus, i.e. all the Downstream Ports, including
hotplug ports.  So they're runtime resumed as well and thus pass from
D0uninitialized to D0initialized.  Is pci_wakeup_bus() not called in your
case, and if so, why not?


> > > > > For this reason change the PCIe portdrv power management logic so that
> > > > > it is fine to keep the port runtime suspended over system suspend but it
> > > > > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > > > > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > > > > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > > > > to pci_pm_complete() on resume and this skips resuming the port.
> > > > 
> > > > On Macs, if no Thunderbolt device is attached, it is perfectly okay to
> > > > use direct complete and it is also perfectly okay to leave the entire
> > > > controller (including all its PCIe ports) in D3cold when coming out of
> > > > system sleep.  In fact it would be unnecessary and undesirable to
> > > > runtime resume the controller, it would just waste energy for no reason.
> > > 
> > > I'm surprised if it works like that because both PCIe spec and
> > > conventional PCI spec both require reset after D3cold and the only state
> > > for a function after that is D0uninitialized.
> > 
> > The controller is in D3cold when coming out of system sleep if it was in
> > D3cold when sytem sleep commenced.
> 
> In that case if the whole chain is left in D3cold over system sleep
> there is no issue. But at least with the Thunderbolt controllers I've
> been dealing with, the xHCI (or the USB stack) that is part of the PCIe
> switch wants to resume the controller and that also resumes the upstream
> port and the root port which leads to this situation. I did not check
> Apple systems, though but I thought they also include xHCI.

They do not if they're older than Alpine Ridge of course.

Thanks,

Lukas
Mika Westerberg Sept. 11, 2018, 10:23 a.m. UTC | #8
On Tue, Sep 11, 2018 at 11:53:40AM +0200, Lukas Wunner wrote:
> On Tue, Sep 11, 2018 at 12:41:44PM +0300, Mika Westerberg wrote:
> > On Tue, Sep 11, 2018 at 11:26:30AM +0200, Lukas Wunner wrote:
> > > On Tue, Sep 11, 2018 at 12:08:17PM +0300, Mika Westerberg wrote:
> > > > On Tue, Sep 11, 2018 at 10:29:35AM +0200, Lukas Wunner wrote:
> > > > > On Thu, Sep 06, 2018 at 06:50:15PM +0300, Mika Westerberg wrote:
> > > > > > Currently we try to keep PCIe ports runtime suspended over system
> > > > > > suspend if possible. This mostly happens when entering suspend-to-idle
> > > > > > because there is no need to re-configure wake settings.
> > > > > > 
> > > > > > This causes problems if the parent port goes into D3cold and it gets
> > > > > > resumed upon exit from system suspend. This may happen for example if
> > > > > > the port is part of PCIe switch and the same switch is connected to a
> > > > > > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > > > > > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > > > > > reset is signaled. After this the device is in D0unitialized state
> > > > > > keeping PME context if it supports wake from D3cold.
> > > > > > 
> > > > > > The problem occurs when a PCIe hotplug port is left suspended and the
> > > > > > parent port goes into D3cold and back to D0, the port keeps its PME
> > > > > > context but since everything else is reset back to defaults
> > > > > > (D0unitialized) it is not set to detect hotplug events anymore.
> > > > > 
> > > > > We call pci_wakeup_bus() in __pci_start_power_transition() for this
> > > > > reason.  Why isn't that sufficient in your use case?
> > > > 
> > > > It would otherwise but __pci_start_power_transition() is never called
> > > > because the bridge is left suspended.
> > > 
> > > You write above that the parent goes to D0, now you say it is left
> > > suspended.  Which one is it?
> > 
> > The port below the parent is left suspended -- the hotplug port.
> > 
> > Once the upstream port of a PCIe switch is resumed to D0 it will be
> > reset and that reset is propagated to other ports in that switch so it
> > makes the hotplug port go to D0uninitialized as well.
> 
> Yes, but as said when the PCI core runtime resumes the upstream port to D0,
> __pci_start_power_transition() should call pci_wakeup_bus() to wake all
> devices on its subordinate bus, i.e. all the Downstream Ports, including
> hotplug ports.  So they're runtime resumed as well and thus pass from
> D0uninitialized to D0initialized.  Is pci_wakeup_bus() not called in your
> case, and if so, why not?

If I read the PCI PM code right the way back to D0 happens like:

   pci_pm_resume_noirq()
     pci_pm_default_resume_early()
       pci_power_up()
         pci_raw_set_power_state(dev, PCI_D0)

To me it looks like __pci_start_power_transition() is not called in the
resume path.
Rafael J. Wysocki Sept. 11, 2018, 10:33 a.m. UTC | #9
On Tuesday, September 11, 2018 11:15:20 AM CEST Mika Westerberg wrote:
> On Tue, Sep 11, 2018 at 10:00:07AM +0200, Rafael J. Wysocki wrote:
> > On Thu, Sep 6, 2018 at 5:50 PM Mika Westerberg
> > <mika.westerberg@linux.intel.com> wrote:
> > >
> > > Currently we try to keep PCIe ports runtime suspended over system
> > > suspend if possible. This mostly happens when entering suspend-to-idle
> > > because there is no need to re-configure wake settings.
> > >
> > > This causes problems if the parent port goes into D3cold and it gets
> > > resumed upon exit from system suspend. This may happen for example if
> > > the port is part of PCIe switch and the same switch is connected to a
> > > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > > reset is signaled. After this the device is in D0unitialized state
> > > keeping PME context if it supports wake from D3cold.
> > >
> > > The problem occurs when a PCIe hotplug port is left suspended and the
> > > parent port goes into D3cold and back to D0, the port keeps its PME
> > > context but since everything else is reset back to defaults
> > > (D0unitialized) it is not set to detect hotplug events anymore.
> > >
> > > For this reason change the PCIe portdrv power management logic so that
> > > it is fine to keep the port runtime suspended over system suspend but it
> > > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > > to pci_pm_complete() on resume and this skips resuming the port.
> > 
> > Thanks for the detailed explanation, it helps quite a bit!
> > 
> > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > ---
> > >  drivers/pci/pcie/portdrv_pci.c | 20 ++++++++++++++++++--
> > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > > index eef22dc29140..74761f660a30 100644
> > > --- a/drivers/pci/pcie/portdrv_pci.c
> > > +++ b/drivers/pci/pcie/portdrv_pci.c
> > > @@ -43,6 +43,21 @@ __setup("pcie_ports=", pcie_port_setup);
> > >  /* global data */
> > >
> > >  #ifdef CONFIG_PM
> > > +int pcie_port_prepare(struct device *dev)
> > > +{
> > > +       /*
> > > +        * Return 0 here to indicate PCI core that:
> > > +        *   - Direct complete path should be avoided
> > > +        *   - It is OK to leave the port runtime suspended over system
> > > +        *     suspend
> > > +        *
> > > +        * However, the port needs to be resumed afterwards because it may
> > > +        * have been in D3cold in which case we need to re-initialize the
> > > +        * hardware as it is in D0uninitialized in that case.
> > > +        */
> > > +       return 0;
> > > +}
> > 
> > You wouldn't need this if you passed DPM_FLAG_NEVER_SKIP to
> > dev_pm_set_driver_flags() (instead of DPM_FLAG_SMART_SUSPEND), would

I whould have said "instead of DPM_FLAG_SMART_PREPARE" here, sorry.

> > you?
> 
> Yes, I think it would work. I did not use it here because I thought it
> would be better to leave the port runtime suspended during system
> suspend (to idle) but in both cases we end up resuming them at some
> point (before or after system suspend).

DPM_FLAG_NEVER_SKIP only affects the direct-complete optimization.

It has no effect on the other behavior, so it is valid to have both
DPM_FLAG_NEVER_SKIP and DPM_FLAG_SMART_SUSPEND set at the same time
which I think is what you want.

[On the other hand, if DPM_FLAG_NEVER_SKIP is set, DPM_FLAG_SMART_PREPARE
has no effect.]

> DPM_FLAG_NEVER_SKIP would result a cleaner patch, I think.
> 

Well, that's my point. :-)
Mika Westerberg Sept. 11, 2018, 10:41 a.m. UTC | #10
On Tue, Sep 11, 2018 at 12:33:46PM +0200, Rafael J. Wysocki wrote:
> On Tuesday, September 11, 2018 11:15:20 AM CEST Mika Westerberg wrote:
> > On Tue, Sep 11, 2018 at 10:00:07AM +0200, Rafael J. Wysocki wrote:
> > > On Thu, Sep 6, 2018 at 5:50 PM Mika Westerberg
> > > <mika.westerberg@linux.intel.com> wrote:
> > > >
> > > > Currently we try to keep PCIe ports runtime suspended over system
> > > > suspend if possible. This mostly happens when entering suspend-to-idle
> > > > because there is no need to re-configure wake settings.
> > > >
> > > > This causes problems if the parent port goes into D3cold and it gets
> > > > resumed upon exit from system suspend. This may happen for example if
> > > > the port is part of PCIe switch and the same switch is connected to a
> > > > PCIe endpoint that needs to be resumed. The way exit from D3cold works
> > > > according PCIe 4.0 spec 5.3.1.4.2 is that power is restored and cold
> > > > reset is signaled. After this the device is in D0unitialized state
> > > > keeping PME context if it supports wake from D3cold.
> > > >
> > > > The problem occurs when a PCIe hotplug port is left suspended and the
> > > > parent port goes into D3cold and back to D0, the port keeps its PME
> > > > context but since everything else is reset back to defaults
> > > > (D0unitialized) it is not set to detect hotplug events anymore.
> > > >
> > > > For this reason change the PCIe portdrv power management logic so that
> > > > it is fine to keep the port runtime suspended over system suspend but it
> > > > needs to be resumed upon exit to make sure it gets properly re-initialized.
> > > > The custom ->prepare() hook with DPM_FLAG_SMART_PREPARE is needed
> > > > because otherwise pci_pm_prepare() instructs the PM core to go directly
> > > > to pci_pm_complete() on resume and this skips resuming the port.
> > > 
> > > Thanks for the detailed explanation, it helps quite a bit!
> > > 
> > > > Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
> > > > ---
> > > >  drivers/pci/pcie/portdrv_pci.c | 20 ++++++++++++++++++--
> > > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > > > index eef22dc29140..74761f660a30 100644
> > > > --- a/drivers/pci/pcie/portdrv_pci.c
> > > > +++ b/drivers/pci/pcie/portdrv_pci.c
> > > > @@ -43,6 +43,21 @@ __setup("pcie_ports=", pcie_port_setup);
> > > >  /* global data */
> > > >
> > > >  #ifdef CONFIG_PM
> > > > +int pcie_port_prepare(struct device *dev)
> > > > +{
> > > > +       /*
> > > > +        * Return 0 here to indicate PCI core that:
> > > > +        *   - Direct complete path should be avoided
> > > > +        *   - It is OK to leave the port runtime suspended over system
> > > > +        *     suspend
> > > > +        *
> > > > +        * However, the port needs to be resumed afterwards because it may
> > > > +        * have been in D3cold in which case we need to re-initialize the
> > > > +        * hardware as it is in D0uninitialized in that case.
> > > > +        */
> > > > +       return 0;
> > > > +}
> > > 
> > > You wouldn't need this if you passed DPM_FLAG_NEVER_SKIP to
> > > dev_pm_set_driver_flags() (instead of DPM_FLAG_SMART_SUSPEND), would
> 
> I whould have said "instead of DPM_FLAG_SMART_PREPARE" here, sorry.
> 
> > > you?
> > 
> > Yes, I think it would work. I did not use it here because I thought it
> > would be better to leave the port runtime suspended during system
> > suspend (to idle) but in both cases we end up resuming them at some
> > point (before or after system suspend).
> 
> DPM_FLAG_NEVER_SKIP only affects the direct-complete optimization.

I did not know that. Thanks for the clarification. :)

> It has no effect on the other behavior, so it is valid to have both
> DPM_FLAG_NEVER_SKIP and DPM_FLAG_SMART_SUSPEND set at the same time
> which I think is what you want.

Indeed, that sounds like the right combination of flags.
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index eef22dc29140..74761f660a30 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -43,6 +43,21 @@  __setup("pcie_ports=", pcie_port_setup);
 /* global data */
 
 #ifdef CONFIG_PM
+int pcie_port_prepare(struct device *dev)
+{
+	/*
+	 * Return 0 here to indicate PCI core that:
+	 *   - Direct complete path should be avoided
+	 *   - It is OK to leave the port runtime suspended over system
+	 *     suspend
+	 *
+	 * However, the port needs to be resumed afterwards because it may
+	 * have been in D3cold in which case we need to re-initialize the
+	 * hardware as it is in D0uninitialized in that case.
+	 */
+	return 0;
+}
+
 static int pcie_port_runtime_suspend(struct device *dev)
 {
 	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
@@ -64,6 +79,7 @@  static int pcie_port_runtime_idle(struct device *dev)
 }
 
 static const struct dev_pm_ops pcie_portdrv_pm_ops = {
+	.prepare	= pcie_port_prepare,
 	.suspend	= pcie_port_device_suspend,
 	.resume_noirq	= pcie_port_device_resume_noirq,
 	.resume		= pcie_port_device_resume,
@@ -109,8 +125,8 @@  static int pcie_portdrv_probe(struct pci_dev *dev,
 
 	pci_save_state(dev);
 
-	dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_SMART_SUSPEND |
-					   DPM_FLAG_LEAVE_SUSPENDED);
+	dev_pm_set_driver_flags(&dev->dev, DPM_FLAG_SMART_PREPARE |
+					   DPM_FLAG_SMART_SUSPEND);
 
 	if (pci_bridge_d3_possible(dev)) {
 		/*