diff mbox

[1/2] PCI/DPC: Disable interrupt generation during suspend

Message ID 20180322193630.GB252023@bhelgaas-glaptop.roam.corp.google.com (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Bjorn Helgaas March 22, 2018, 7:36 p.m. UTC
On Thu, Mar 22, 2018 at 06:39:03PM +0100, Lukas Wunner wrote:
> On Thu, Mar 22, 2018 at 06:53:17PM +0200, Mika Westerberg wrote:
> > On Thu, Mar 22, 2018 at 11:45:17AM +0100, Lukas Wunner wrote:
> > > Now I've thought of one.
> > > 
> > > The port may have more children besides the port service devices,
> > > namely all the PCI devices below the port.  The PM core doesn't
> > > impose a specific ordering on suspend/resume but will try to
> > > parallelize among all the children.
> > > 
> > > Usually that's not what you want.  On resume, you want to resume
> > > the port itself (including its port services) *before* resuming
> > > the PCI child devices.  And the other way round on suspend.
> > 
> > That's a good point.
> > 
> > So I guess there is no way avoiding adding suspend_late/resume_early
> > callbacks to the pcie port service structure. I'll do that in the next
> > revision.
> 
> Well, there *are* ways to avoid it but they might not be better.
> 
> Iterating over the port services' callbacks is equivalent to ordering
> the port service devices after the port's PCI device but before its
> PCI child devices in devices_kset.
> 
> That can also be achieved by adding a device link from every PCI child
> device (consumer) to every port service device (provider).  The result
> however is a combinatorial explosion.  Say you've got 64 down stream
> bridges in a PCIe switch and the upstream bridge has 3 port services,
> that's 3 x 64 = 192 device links.  That's probably clumsier than
> iterating over the port services.

I hope we can avoid adding suspend_late/resume_early callbacks in
struct pcie_port_service_driver, and I also hope we can avoid adding
device links.  Those both sound pretty complicated.

Can you do something like the patch below, which does something
similar for PME?


commit 6c4dfc1389e1
Author: Bjorn Helgaas <bhelgaas@google.com>
Date:   Fri Mar 9 11:06:54 2018 -0600

    PCI/PM: Clear PCIe PME Status bit in core, not PCIe port driver
    
    fe31e69740ed ("PCI/PCIe: Clear Root PME Status bits early during system
    resume") added a .resume_noirq() callback to the PCIe port driver to clear
    the PME Status bit during resume to work around a BIOS issue.
    
    The BIOS evidently enabled PME interrupts for ACPI-based runtime wakeups
    but did not clear the PME Status bit during resume, which meant PMEs after
    resume did not trigger interrupts because PME Status did not transition
    from cleared to set.
    
    The fix was in the PCIe port driver, so it worked when CONFIG_PCIEPORTBUS
    was set.  But I think we *always* want the fix because the platform may use
    PME interrupts even if Linux is built without the PCIe port driver.
    
    Move the fix from the port driver to the PCI core so we can work around
    this "PME doesn't work after waking from a sleep state" issue regardless of
    CONFIG_PCIEPORTBUS.
    
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>

Comments

Mika Westerberg March 23, 2018, 11:18 a.m. UTC | #1
On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> I hope we can avoid adding suspend_late/resume_early callbacks in
> struct pcie_port_service_driver, and I also hope we can avoid adding
> device links.  Those both sound pretty complicated.
> 
> Can you do something like the patch below, which does something
> similar for PME?

AFAICT the core PCI PM code follows the same ordering than what PM core
does so it may be possible that not all service drivers get
resumed/suspended before other children (PCI buses). Basically this
would be the same than just using core PM ops in DPC driver (in which
case DPC specific things are still kept in DPC driver not in PCI core).

If we do not care about that then I think both options are pretty
straighforward to implement.
Bjorn Helgaas March 23, 2018, 9:08 p.m. UTC | #2
On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > I hope we can avoid adding suspend_late/resume_early callbacks in
> > struct pcie_port_service_driver, and I also hope we can avoid adding
> > device links.  Those both sound pretty complicated.
> > 
> > Can you do something like the patch below, which does something
> > similar for PME?
> 
> AFAICT the core PCI PM code follows the same ordering than what PM core
> does so it may be possible that not all service drivers get
> resumed/suspended before other children (PCI buses). Basically this
> would be the same than just using core PM ops in DPC driver (in which
> case DPC specific things are still kept in DPC driver not in PCI core).

I'm not sure I follow this.  I assume the core PCI PM code guarantees
that a bridge is suspended after its children and resumed before them.
Are you saying that's not the case?
Rafael J. Wysocki March 23, 2018, 9:11 p.m. UTC | #3
On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
>> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
>> > I hope we can avoid adding suspend_late/resume_early callbacks in
>> > struct pcie_port_service_driver, and I also hope we can avoid adding
>> > device links.  Those both sound pretty complicated.
>> >
>> > Can you do something like the patch below, which does something
>> > similar for PME?
>>
>> AFAICT the core PCI PM code follows the same ordering than what PM core
>> does so it may be possible that not all service drivers get
>> resumed/suspended before other children (PCI buses). Basically this
>> would be the same than just using core PM ops in DPC driver (in which
>> case DPC specific things are still kept in DPC driver not in PCI core).
>
> I'm not sure I follow this.  I assume the core PCI PM code guarantees
> that a bridge is suspended after its children and resumed before them.
> Are you saying that's not the case?

if this is a PCIe port, then there are two categories of childres:
port services and the PCI devices below it.

There are no ordering constraints between the former and the latter,
which appears to be a problem here.
Bjorn Helgaas March 23, 2018, 10:01 p.m. UTC | #4
On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote:
> On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
> >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> >> > I hope we can avoid adding suspend_late/resume_early callbacks in
> >> > struct pcie_port_service_driver, and I also hope we can avoid adding
> >> > device links.  Those both sound pretty complicated.
> >> >
> >> > Can you do something like the patch below, which does something
> >> > similar for PME?
> >>
> >> AFAICT the core PCI PM code follows the same ordering than what PM core
> >> does so it may be possible that not all service drivers get
> >> resumed/suspended before other children (PCI buses). Basically this
> >> would be the same than just using core PM ops in DPC driver (in which
> >> case DPC specific things are still kept in DPC driver not in PCI core).
> >
> > I'm not sure I follow this.  I assume the core PCI PM code guarantees
> > that a bridge is suspended after its children and resumed before them.
> > Are you saying that's not the case?
> 
> if this is a PCIe port, then there are two categories of childres:
> port services and the PCI devices below it.
> 
> There are no ordering constraints between the former and the latter,
> which appears to be a problem here.

It seems like a pretty fundamental problem if port services can be
suspended before PCI devices below the port.  I assume that would have
to be fixed somehow in the PCI core and the port driver, but this
patch only touches the DPC service driver.

I'd really like to get rid of the port services driver altogether, and
this ordering issue seems like a good reason to do that.
Rafael J. Wysocki March 24, 2018, 10:48 a.m. UTC | #5
On Fri, Mar 23, 2018 at 11:01 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote:
>> On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
>> > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
>> >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
>> >> > I hope we can avoid adding suspend_late/resume_early callbacks in
>> >> > struct pcie_port_service_driver, and I also hope we can avoid adding
>> >> > device links.  Those both sound pretty complicated.
>> >> >
>> >> > Can you do something like the patch below, which does something
>> >> > similar for PME?
>> >>
>> >> AFAICT the core PCI PM code follows the same ordering than what PM core
>> >> does so it may be possible that not all service drivers get
>> >> resumed/suspended before other children (PCI buses). Basically this
>> >> would be the same than just using core PM ops in DPC driver (in which
>> >> case DPC specific things are still kept in DPC driver not in PCI core).
>> >
>> > I'm not sure I follow this.  I assume the core PCI PM code guarantees
>> > that a bridge is suspended after its children and resumed before them.
>> > Are you saying that's not the case?
>>
>> if this is a PCIe port, then there are two categories of childres:
>> port services and the PCI devices below it.
>>
>> There are no ordering constraints between the former and the latter,
>> which appears to be a problem here.
>
> It seems like a pretty fundamental problem if port services can be
> suspended before PCI devices below the port.  I assume that would have
> to be fixed somehow in the PCI core and the port driver, but this
> patch only touches the DPC service driver.
>
> I'd really like to get rid of the port services driver altogether, and
> this ordering issue seems like a good reason to do that.

I agree.

In fact, I was about to say the exact same thing, but you beat me to that. :-)
Lukas Wunner March 24, 2018, 12:15 p.m. UTC | #6
On Fri, Mar 23, 2018 at 05:01:27PM -0500, Bjorn Helgaas wrote:
> On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> > > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote:
> > >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > >> > I hope we can avoid adding suspend_late/resume_early callbacks in
> > >> > struct pcie_port_service_driver, and I also hope we can avoid adding
> > >> > device links.  Those both sound pretty complicated.
> > >> >
> > >> > Can you do something like the patch below, which does something
> > >> > similar for PME?
> > >>
> > >> AFAICT the core PCI PM code follows the same ordering than what PM core
> > >> does so it may be possible that not all service drivers get
> > >> resumed/suspended before other children (PCI buses). Basically this
> > >> would be the same than just using core PM ops in DPC driver (in which
> > >> case DPC specific things are still kept in DPC driver not in PCI core).
> > >
> > > I'm not sure I follow this.  I assume the core PCI PM code guarantees
> > > that a bridge is suspended after its children and resumed before them.
> > > Are you saying that's not the case?
> > 
> > if this is a PCIe port, then there are two categories of childres:
> > port services and the PCI devices below it.
> > 
> > There are no ordering constraints between the former and the latter,
> > which appears to be a problem here.
> 
> It seems like a pretty fundamental problem if port services can be
> suspended before PCI devices below the port.  I assume that would have
> to be fixed somehow in the PCI core and the port driver, but this
> patch only touches the DPC service driver.
> 
> I'd really like to get rid of the port services driver altogether, and
> this ordering issue seems like a good reason to do that.

As it is, there is nothing to fix.

The port services driver enforces that the services are resumed before
PCI child devices (and after them on suspend) by iterating over the
services' ->resume / ->suspend callbacks, see pcie_port_device_resume()
and pcie_port_device_suspend(), which in turn are the port's ->resume /
->suspend callbacks.

There is (or was, before your reorganization this cycle) an inconsistency
in the handling of ->resume / ->suspend stages vis-à-vis the ->resume_noirq /
->runtime_resume / ->runtime_>suspend / ->runtime_idle stages.  The latter
are handled directly by portdrv callbacks whereas the former are handled
by per-service callbacks which we iterate over.

I cooked up these commits two years ago to make the handling more consistent
but ended up not needing them:
https://github.com/l1k/linux/commit/4f59372f830899e89fea9ba7c090680900d5998e
https://github.com/l1k/linux/commit/ce63b6ed01602e77c56043ee6dfa52ca9d257f88

Thanks,

Lukas
Lukas Wunner March 24, 2018, 1:48 p.m. UTC | #7
On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> I hope we can avoid adding suspend_late/resume_early callbacks in
> struct pcie_port_service_driver,

I'm fairly certain that we cannot avoid adding at least a ->resume_noirq
callback to struct pcie_port_service_driver to fix a pciehp use case:

On ->resume_noirq the PCI core walks down the hierarchy to put every
device in D0 and restore its state (with a few exceptions such as direct
complete).  However with hotplug ports, it's possible that the user has
unplugged devices while the system was asleep, or replaced them with
other devices.  That's a very real use case with Thunderbolt and we're
handling it poorly or not at all currently.  We need to check if the
devices below a hotplug port are still there or have been replaced
(can probably be recognized by looking at vendor/device IDs across the
entire sub-hierarchy) during the ->resume_noirq phase.  We could mark
them with pci_dev_set_disconnected(), then skip putting them into D0
if that flag has been set.


> @@ -102,7 +88,6 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = {
>  	.thaw		= pcie_port_device_resume,
>  	.poweroff	= pcie_port_device_suspend,
>  	.restore	= pcie_port_device_resume,
> -	.resume_noirq	= pcie_port_resume_noirq,
>  	.runtime_suspend = pcie_port_runtime_suspend,
>  	.runtime_resume	= pcie_port_runtime_resume,
>  	.runtime_idle	= pcie_port_runtime_idle,

So the above would have to be reverted unfortunately when we fix this
use case.

Thanks,

Lukas
Bjorn Helgaas March 24, 2018, 2:09 p.m. UTC | #8
On Sat, Mar 24, 2018 at 02:48:07PM +0100, Lukas Wunner wrote:
> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > I hope we can avoid adding suspend_late/resume_early callbacks in
> > struct pcie_port_service_driver,
> 
> I'm fairly certain that we cannot avoid adding at least a ->resume_noirq
> callback to struct pcie_port_service_driver to fix a pciehp use case:
> 
> On ->resume_noirq the PCI core walks down the hierarchy to put every
> device in D0 and restore its state (with a few exceptions such as direct
> complete).  However with hotplug ports, it's possible that the user has
> unplugged devices while the system was asleep, or replaced them with
> other devices.  That's a very real use case with Thunderbolt and we're
> handling it poorly or not at all currently.  We need to check if the
> devices below a hotplug port are still there or have been replaced
> (can probably be recognized by looking at vendor/device IDs across the
> entire sub-hierarchy) during the ->resume_noirq phase.  We could mark
> them with pci_dev_set_disconnected(), then skip putting them into D0
> if that flag has been set.

This is definitely a real issue, and I think it affects more than just
Thunderbolt.  We've never handled undocks very well either.

I certainly agree we need a way to handle this; I would just rather do
it by integrating pciehp and the other services more directly into the
PCI core instead of using the port driver.  Maybe we'll need a
short-term ->resume_noirq, but I don't think the port driver is a good
long-term solution.

Bjorn
Mika Westerberg March 26, 2018, 9:55 a.m. UTC | #9
On Sat, Mar 24, 2018 at 09:09:50AM -0500, Bjorn Helgaas wrote:
> On Sat, Mar 24, 2018 at 02:48:07PM +0100, Lukas Wunner wrote:
> > On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote:
> > > I hope we can avoid adding suspend_late/resume_early callbacks in
> > > struct pcie_port_service_driver,
> > 
> > I'm fairly certain that we cannot avoid adding at least a ->resume_noirq
> > callback to struct pcie_port_service_driver to fix a pciehp use case:
> > 
> > On ->resume_noirq the PCI core walks down the hierarchy to put every
> > device in D0 and restore its state (with a few exceptions such as direct
> > complete).  However with hotplug ports, it's possible that the user has
> > unplugged devices while the system was asleep, or replaced them with
> > other devices.  That's a very real use case with Thunderbolt and we're
> > handling it poorly or not at all currently.  We need to check if the
> > devices below a hotplug port are still there or have been replaced
> > (can probably be recognized by looking at vendor/device IDs across the
> > entire sub-hierarchy) during the ->resume_noirq phase.  We could mark
> > them with pci_dev_set_disconnected(), then skip putting them into D0
> > if that flag has been set.
> 
> This is definitely a real issue, and I think it affects more than just
> Thunderbolt.  We've never handled undocks very well either.
> 
> I certainly agree we need a way to handle this; I would just rather do
> it by integrating pciehp and the other services more directly into the
> PCI core instead of using the port driver.  Maybe we'll need a
> short-term ->resume_noirq, but I don't think the port driver is a good
> long-term solution.

For the context of this patch series, I think I'll drop this patch from
the series now until it is clear what the direction is. I agree moving
port driver functionality into the PCI core makes sense.

I'll send out an updated version of the first patch as it does not
require these hooks.
diff mbox

Patch

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index 3bed6beda051..e561fa0f456c 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -525,6 +525,18 @@  static void pci_pm_default_resume_early(struct pci_dev *pci_dev)
 	pci_fixup_device(pci_fixup_resume_early, pci_dev);
 }
 
+static void pcie_pme_root_status_cleanup(struct pci_dev *pci_dev)
+{
+	/*
+	 * Some BIOSes forget to clear Root PME Status bits after system
+	 * wakeup, which breaks ACPI-based runtime wakeup on PCI Express.
+	 * Clear those bits now just in case (shouldn't hurt).
+	 */
+	if (pci_is_pcie(pci_dev) &&
+	    pci_pcie_type(pci_dev) == PCI_EXP_TYPE_ROOT_PORT)
+		pcie_clear_root_pme_status(pci_dev);
+}
+
 /*
  * Default "suspend" method for devices that have no driver provided suspend,
  * or not even a driver at all (second part).
@@ -873,6 +885,8 @@  static int pci_pm_resume_noirq(struct device *dev)
 	if (pci_has_legacy_pm_support(pci_dev))
 		return pci_legacy_resume_early(dev);
 
+	pcie_pme_root_status_cleanup(pci_dev);
+
 	if (drv && drv->pm && drv->pm->resume_noirq)
 		error = drv->pm->resume_noirq(dev);
 
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index d6f10a97d400..ec9e936c2a5b 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -61,20 +61,6 @@  static int pcie_portdrv_restore_config(struct pci_dev *dev)
 }
 
 #ifdef CONFIG_PM
-static int pcie_port_resume_noirq(struct device *dev)
-{
-	struct pci_dev *pdev = to_pci_dev(dev);
-
-	/*
-	 * Some BIOSes forget to clear Root PME Status bits after system wakeup
-	 * which breaks ACPI-based runtime wakeup on PCI Express, so clear those
-	 * bits now just in case (shouldn't hurt).
-	 */
-	if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT)
-		pcie_clear_root_pme_status(pdev);
-	return 0;
-}
-
 static int pcie_port_runtime_suspend(struct device *dev)
 {
 	return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY;
@@ -102,7 +88,6 @@  static const struct dev_pm_ops pcie_portdrv_pm_ops = {
 	.thaw		= pcie_port_device_resume,
 	.poweroff	= pcie_port_device_suspend,
 	.restore	= pcie_port_device_resume,
-	.resume_noirq	= pcie_port_resume_noirq,
 	.runtime_suspend = pcie_port_runtime_suspend,
 	.runtime_resume	= pcie_port_runtime_resume,
 	.runtime_idle	= pcie_port_runtime_idle,