Message ID | 20180322193630.GB252023@bhelgaas-glaptop.roam.corp.google.com (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > I hope we can avoid adding suspend_late/resume_early callbacks in > struct pcie_port_service_driver, and I also hope we can avoid adding > device links. Those both sound pretty complicated. > > Can you do something like the patch below, which does something > similar for PME? AFAICT the core PCI PM code follows the same ordering than what PM core does so it may be possible that not all service drivers get resumed/suspended before other children (PCI buses). Basically this would be the same than just using core PM ops in DPC driver (in which case DPC specific things are still kept in DPC driver not in PCI core). If we do not care about that then I think both options are pretty straighforward to implement.
On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote: > On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > > I hope we can avoid adding suspend_late/resume_early callbacks in > > struct pcie_port_service_driver, and I also hope we can avoid adding > > device links. Those both sound pretty complicated. > > > > Can you do something like the patch below, which does something > > similar for PME? > > AFAICT the core PCI PM code follows the same ordering than what PM core > does so it may be possible that not all service drivers get > resumed/suspended before other children (PCI buses). Basically this > would be the same than just using core PM ops in DPC driver (in which > case DPC specific things are still kept in DPC driver not in PCI core). I'm not sure I follow this. I assume the core PCI PM code guarantees that a bridge is suspended after its children and resumed before them. Are you saying that's not the case?
On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote: >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: >> > I hope we can avoid adding suspend_late/resume_early callbacks in >> > struct pcie_port_service_driver, and I also hope we can avoid adding >> > device links. Those both sound pretty complicated. >> > >> > Can you do something like the patch below, which does something >> > similar for PME? >> >> AFAICT the core PCI PM code follows the same ordering than what PM core >> does so it may be possible that not all service drivers get >> resumed/suspended before other children (PCI buses). Basically this >> would be the same than just using core PM ops in DPC driver (in which >> case DPC specific things are still kept in DPC driver not in PCI core). > > I'm not sure I follow this. I assume the core PCI PM code guarantees > that a bridge is suspended after its children and resumed before them. > Are you saying that's not the case? if this is a PCIe port, then there are two categories of childres: port services and the PCI devices below it. There are no ordering constraints between the former and the latter, which appears to be a problem here.
On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote: > On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote: > >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > >> > I hope we can avoid adding suspend_late/resume_early callbacks in > >> > struct pcie_port_service_driver, and I also hope we can avoid adding > >> > device links. Those both sound pretty complicated. > >> > > >> > Can you do something like the patch below, which does something > >> > similar for PME? > >> > >> AFAICT the core PCI PM code follows the same ordering than what PM core > >> does so it may be possible that not all service drivers get > >> resumed/suspended before other children (PCI buses). Basically this > >> would be the same than just using core PM ops in DPC driver (in which > >> case DPC specific things are still kept in DPC driver not in PCI core). > > > > I'm not sure I follow this. I assume the core PCI PM code guarantees > > that a bridge is suspended after its children and resumed before them. > > Are you saying that's not the case? > > if this is a PCIe port, then there are two categories of childres: > port services and the PCI devices below it. > > There are no ordering constraints between the former and the latter, > which appears to be a problem here. It seems like a pretty fundamental problem if port services can be suspended before PCI devices below the port. I assume that would have to be fixed somehow in the PCI core and the port driver, but this patch only touches the DPC service driver. I'd really like to get rid of the port services driver altogether, and this ordering issue seems like a good reason to do that.
On Fri, Mar 23, 2018 at 11:01 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote: >> On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: >> > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote: >> >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: >> >> > I hope we can avoid adding suspend_late/resume_early callbacks in >> >> > struct pcie_port_service_driver, and I also hope we can avoid adding >> >> > device links. Those both sound pretty complicated. >> >> > >> >> > Can you do something like the patch below, which does something >> >> > similar for PME? >> >> >> >> AFAICT the core PCI PM code follows the same ordering than what PM core >> >> does so it may be possible that not all service drivers get >> >> resumed/suspended before other children (PCI buses). Basically this >> >> would be the same than just using core PM ops in DPC driver (in which >> >> case DPC specific things are still kept in DPC driver not in PCI core). >> > >> > I'm not sure I follow this. I assume the core PCI PM code guarantees >> > that a bridge is suspended after its children and resumed before them. >> > Are you saying that's not the case? >> >> if this is a PCIe port, then there are two categories of childres: >> port services and the PCI devices below it. >> >> There are no ordering constraints between the former and the latter, >> which appears to be a problem here. > > It seems like a pretty fundamental problem if port services can be > suspended before PCI devices below the port. I assume that would have > to be fixed somehow in the PCI core and the port driver, but this > patch only touches the DPC service driver. > > I'd really like to get rid of the port services driver altogether, and > this ordering issue seems like a good reason to do that. I agree. In fact, I was about to say the exact same thing, but you beat me to that. :-)
On Fri, Mar 23, 2018 at 05:01:27PM -0500, Bjorn Helgaas wrote: > On Fri, Mar 23, 2018 at 10:11:53PM +0100, Rafael J. Wysocki wrote: > > On Fri, Mar 23, 2018 at 10:08 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > On Fri, Mar 23, 2018 at 01:18:56PM +0200, Mika Westerberg wrote: > > >> On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > > >> > I hope we can avoid adding suspend_late/resume_early callbacks in > > >> > struct pcie_port_service_driver, and I also hope we can avoid adding > > >> > device links. Those both sound pretty complicated. > > >> > > > >> > Can you do something like the patch below, which does something > > >> > similar for PME? > > >> > > >> AFAICT the core PCI PM code follows the same ordering than what PM core > > >> does so it may be possible that not all service drivers get > > >> resumed/suspended before other children (PCI buses). Basically this > > >> would be the same than just using core PM ops in DPC driver (in which > > >> case DPC specific things are still kept in DPC driver not in PCI core). > > > > > > I'm not sure I follow this. I assume the core PCI PM code guarantees > > > that a bridge is suspended after its children and resumed before them. > > > Are you saying that's not the case? > > > > if this is a PCIe port, then there are two categories of childres: > > port services and the PCI devices below it. > > > > There are no ordering constraints between the former and the latter, > > which appears to be a problem here. > > It seems like a pretty fundamental problem if port services can be > suspended before PCI devices below the port. I assume that would have > to be fixed somehow in the PCI core and the port driver, but this > patch only touches the DPC service driver. > > I'd really like to get rid of the port services driver altogether, and > this ordering issue seems like a good reason to do that. As it is, there is nothing to fix. The port services driver enforces that the services are resumed before PCI child devices (and after them on suspend) by iterating over the services' ->resume / ->suspend callbacks, see pcie_port_device_resume() and pcie_port_device_suspend(), which in turn are the port's ->resume / ->suspend callbacks. There is (or was, before your reorganization this cycle) an inconsistency in the handling of ->resume / ->suspend stages vis-à-vis the ->resume_noirq / ->runtime_resume / ->runtime_>suspend / ->runtime_idle stages. The latter are handled directly by portdrv callbacks whereas the former are handled by per-service callbacks which we iterate over. I cooked up these commits two years ago to make the handling more consistent but ended up not needing them: https://github.com/l1k/linux/commit/4f59372f830899e89fea9ba7c090680900d5998e https://github.com/l1k/linux/commit/ce63b6ed01602e77c56043ee6dfa52ca9d257f88 Thanks, Lukas
On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > I hope we can avoid adding suspend_late/resume_early callbacks in > struct pcie_port_service_driver, I'm fairly certain that we cannot avoid adding at least a ->resume_noirq callback to struct pcie_port_service_driver to fix a pciehp use case: On ->resume_noirq the PCI core walks down the hierarchy to put every device in D0 and restore its state (with a few exceptions such as direct complete). However with hotplug ports, it's possible that the user has unplugged devices while the system was asleep, or replaced them with other devices. That's a very real use case with Thunderbolt and we're handling it poorly or not at all currently. We need to check if the devices below a hotplug port are still there or have been replaced (can probably be recognized by looking at vendor/device IDs across the entire sub-hierarchy) during the ->resume_noirq phase. We could mark them with pci_dev_set_disconnected(), then skip putting them into D0 if that flag has been set. > @@ -102,7 +88,6 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = { > .thaw = pcie_port_device_resume, > .poweroff = pcie_port_device_suspend, > .restore = pcie_port_device_resume, > - .resume_noirq = pcie_port_resume_noirq, > .runtime_suspend = pcie_port_runtime_suspend, > .runtime_resume = pcie_port_runtime_resume, > .runtime_idle = pcie_port_runtime_idle, So the above would have to be reverted unfortunately when we fix this use case. Thanks, Lukas
On Sat, Mar 24, 2018 at 02:48:07PM +0100, Lukas Wunner wrote: > On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > > I hope we can avoid adding suspend_late/resume_early callbacks in > > struct pcie_port_service_driver, > > I'm fairly certain that we cannot avoid adding at least a ->resume_noirq > callback to struct pcie_port_service_driver to fix a pciehp use case: > > On ->resume_noirq the PCI core walks down the hierarchy to put every > device in D0 and restore its state (with a few exceptions such as direct > complete). However with hotplug ports, it's possible that the user has > unplugged devices while the system was asleep, or replaced them with > other devices. That's a very real use case with Thunderbolt and we're > handling it poorly or not at all currently. We need to check if the > devices below a hotplug port are still there or have been replaced > (can probably be recognized by looking at vendor/device IDs across the > entire sub-hierarchy) during the ->resume_noirq phase. We could mark > them with pci_dev_set_disconnected(), then skip putting them into D0 > if that flag has been set. This is definitely a real issue, and I think it affects more than just Thunderbolt. We've never handled undocks very well either. I certainly agree we need a way to handle this; I would just rather do it by integrating pciehp and the other services more directly into the PCI core instead of using the port driver. Maybe we'll need a short-term ->resume_noirq, but I don't think the port driver is a good long-term solution. Bjorn
On Sat, Mar 24, 2018 at 09:09:50AM -0500, Bjorn Helgaas wrote: > On Sat, Mar 24, 2018 at 02:48:07PM +0100, Lukas Wunner wrote: > > On Thu, Mar 22, 2018 at 02:36:30PM -0500, Bjorn Helgaas wrote: > > > I hope we can avoid adding suspend_late/resume_early callbacks in > > > struct pcie_port_service_driver, > > > > I'm fairly certain that we cannot avoid adding at least a ->resume_noirq > > callback to struct pcie_port_service_driver to fix a pciehp use case: > > > > On ->resume_noirq the PCI core walks down the hierarchy to put every > > device in D0 and restore its state (with a few exceptions such as direct > > complete). However with hotplug ports, it's possible that the user has > > unplugged devices while the system was asleep, or replaced them with > > other devices. That's a very real use case with Thunderbolt and we're > > handling it poorly or not at all currently. We need to check if the > > devices below a hotplug port are still there or have been replaced > > (can probably be recognized by looking at vendor/device IDs across the > > entire sub-hierarchy) during the ->resume_noirq phase. We could mark > > them with pci_dev_set_disconnected(), then skip putting them into D0 > > if that flag has been set. > > This is definitely a real issue, and I think it affects more than just > Thunderbolt. We've never handled undocks very well either. > > I certainly agree we need a way to handle this; I would just rather do > it by integrating pciehp and the other services more directly into the > PCI core instead of using the port driver. Maybe we'll need a > short-term ->resume_noirq, but I don't think the port driver is a good > long-term solution. For the context of this patch series, I think I'll drop this patch from the series now until it is clear what the direction is. I agree moving port driver functionality into the PCI core makes sense. I'll send out an updated version of the first patch as it does not require these hooks.
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 3bed6beda051..e561fa0f456c 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -525,6 +525,18 @@ static void pci_pm_default_resume_early(struct pci_dev *pci_dev) pci_fixup_device(pci_fixup_resume_early, pci_dev); } +static void pcie_pme_root_status_cleanup(struct pci_dev *pci_dev) +{ + /* + * Some BIOSes forget to clear Root PME Status bits after system + * wakeup, which breaks ACPI-based runtime wakeup on PCI Express. + * Clear those bits now just in case (shouldn't hurt). + */ + if (pci_is_pcie(pci_dev) && + pci_pcie_type(pci_dev) == PCI_EXP_TYPE_ROOT_PORT) + pcie_clear_root_pme_status(pci_dev); +} + /* * Default "suspend" method for devices that have no driver provided suspend, * or not even a driver at all (second part). @@ -873,6 +885,8 @@ static int pci_pm_resume_noirq(struct device *dev) if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume_early(dev); + pcie_pme_root_status_cleanup(pci_dev); + if (drv && drv->pm && drv->pm->resume_noirq) error = drv->pm->resume_noirq(dev); diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index d6f10a97d400..ec9e936c2a5b 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -61,20 +61,6 @@ static int pcie_portdrv_restore_config(struct pci_dev *dev) } #ifdef CONFIG_PM -static int pcie_port_resume_noirq(struct device *dev) -{ - struct pci_dev *pdev = to_pci_dev(dev); - - /* - * Some BIOSes forget to clear Root PME Status bits after system wakeup - * which breaks ACPI-based runtime wakeup on PCI Express, so clear those - * bits now just in case (shouldn't hurt). - */ - if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) - pcie_clear_root_pme_status(pdev); - return 0; -} - static int pcie_port_runtime_suspend(struct device *dev) { return to_pci_dev(dev)->bridge_d3 ? 0 : -EBUSY; @@ -102,7 +88,6 @@ static const struct dev_pm_ops pcie_portdrv_pm_ops = { .thaw = pcie_port_device_resume, .poweroff = pcie_port_device_suspend, .restore = pcie_port_device_resume, - .resume_noirq = pcie_port_resume_noirq, .runtime_suspend = pcie_port_runtime_suspend, .runtime_resume = pcie_port_runtime_resume, .runtime_idle = pcie_port_runtime_idle,