diff mbox series

[RFC] PCI/portdrv: Don't disable pci device during shutdown

Message ID 1596268180-9114-1-git-send-email-chenhc@lemote.com (mailing list archive)
State Superseded, archived
Delegated to: Bjorn Helgaas
Headers show
Series [RFC] PCI/portdrv: Don't disable pci device during shutdown | expand

Commit Message

Huacai Chen Aug. 1, 2020, 7:49 a.m. UTC
Use separate remove()/shutdown() callback, and don't disable pci device
during shutdown. This can avoid some poweroff/reboot failures.

The poweroff/reboot failures can easily reproduce on Loongson platforms.
I think this is not a Loongson-specific problem, instead, is a problem
related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.

Radeon driver is more difficult than amdgpu due to its confusing symbol
names, and I have maintained an out-of-tree patch for a long time [1].
Recently, we found more and more devices can cause the same problem, and
it is very difficult to modify all problematic drivers as radeon/amdgpu
does. So, I think modify the PCIe port driver is a simple and effective
way.

[1] https://github.com/chenhuacai/linux/commit/6612f9c1fc290d42a14618ce9a7d03014d8ebb1a

Signed-off-by: Huacai Chen <chenhc@lemote.com>
---
 drivers/pci/pcie/portdrv.h      |  2 +-
 drivers/pci/pcie/portdrv_core.c |  6 ++++--
 drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
 3 files changed, 18 insertions(+), 5 deletions(-)

Comments

Bjorn Helgaas Sept. 10, 2020, 8:32 p.m. UTC | #1
[+cc Tiezhu]

On Sat, Aug 01, 2020 at 03:49:40PM +0800, Huacai Chen wrote:
> Use separate remove()/shutdown() callback, and don't disable pci device
> during shutdown. This can avoid some poweroff/reboot failures.
> 
> The poweroff/reboot failures can easily reproduce on Loongson platforms.
> I think this is not a Loongson-specific problem, instead, is a problem
> related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
> devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
> ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.
>
> Radeon driver is more difficult than amdgpu due to its confusing symbol
> names, and I have maintained an out-of-tree patch for a long time [1].
> Recently, we found more and more devices can cause the same problem, and
> it is very difficult to modify all problematic drivers as radeon/amdgpu
> does. So, I think modify the PCIe port driver is a simple and effective
> way.

Sounds plausible, but I don't know what the actual problem is, other
than "poweroff/reboot doesn't work, and this patch 'fixes' it".

> [1] https://github.com/chenhuacai/linux/commit/6612f9c1fc290d42a14618ce9a7d03014d8ebb1a
> 
> Signed-off-by: Huacai Chen <chenhc@lemote.com>
> ---
>  drivers/pci/pcie/portdrv.h      |  2 +-
>  drivers/pci/pcie/portdrv_core.c |  6 ++++--
>  drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
>  3 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index af7cf23..22ba7b9 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -123,7 +123,7 @@ int pcie_port_device_resume(struct device *dev);
>  int pcie_port_device_runtime_suspend(struct device *dev);
>  int pcie_port_device_runtime_resume(struct device *dev);
>  #endif
> -void pcie_port_device_remove(struct pci_dev *dev);
> +void pcie_port_device_remove(struct pci_dev *dev, bool disable);
>  int __must_check pcie_port_bus_register(void);
>  void pcie_port_bus_unregister(void);
>  
> diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> index 50a9522..aa165be 100644
> --- a/drivers/pci/pcie/portdrv_core.c
> +++ b/drivers/pci/pcie/portdrv_core.c
> @@ -487,11 +487,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
>   * Remove PCI Express port service devices associated with given port and
>   * disable MSI-X or MSI for the port.
>   */
> -void pcie_port_device_remove(struct pci_dev *dev)
> +void pcie_port_device_remove(struct pci_dev *dev, bool disable)
>  {
>  	device_for_each_child(&dev->dev, NULL, remove_iter);
>  	pci_free_irq_vectors(dev);
> -	pci_disable_device(dev);
> +
> +	if (disable)
> +		pci_disable_device(dev);
>  }

portdrv can only be built statically (not as a module), and I don't
think there's value in being able to unbind it.  So I think we should
remove its .remove() method.  That would be a separate patch, of
course, but it should make this one quite a bit simpler.

>  /**
> diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> index 3acf151..fcd3d2e 100644
> --- a/drivers/pci/pcie/portdrv_pci.c
> +++ b/drivers/pci/pcie/portdrv_pci.c
> @@ -142,7 +142,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
>  		pm_runtime_dont_use_autosuspend(&dev->dev);
>  	}
>  
> -	pcie_port_device_remove(dev);
> +	pcie_port_device_remove(dev, true);
> +}
> +
> +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> +{
> +	if (pci_bridge_d3_possible(dev)) {
> +		pm_runtime_forbid(&dev->dev);
> +		pm_runtime_get_noresume(&dev->dev);
> +		pm_runtime_dont_use_autosuspend(&dev->dev);
> +	}

I admit to being ignorant about how all this PM stuff works.  A
comment here about what's going on and why we need to check
pci_bridge_d3_possible() would be useful.

> +	pcie_port_device_remove(dev, false);
>  }
>  
>  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> @@ -211,7 +222,7 @@ static struct pci_driver pcie_portdriver = {
>  
>  	.probe		= pcie_portdrv_probe,
>  	.remove		= pcie_portdrv_remove,
> -	.shutdown	= pcie_portdrv_remove,
> +	.shutdown	= pcie_portdrv_shutdown,
>  
>  	.err_handler	= &pcie_portdrv_err_handler,
>  
> -- 
> 2.7.0
>
Huacai Chen Sept. 11, 2020, 4:03 a.m. UTC | #2
Hi, all,

On Fri, Sep 11, 2020 at 4:32 AM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Tiezhu]
>
> On Sat, Aug 01, 2020 at 03:49:40PM +0800, Huacai Chen wrote:
> > Use separate remove()/shutdown() callback, and don't disable pci device
> > during shutdown. This can avoid some poweroff/reboot failures.
> >
> > The poweroff/reboot failures can easily reproduce on Loongson platforms.
> > I think this is not a Loongson-specific problem, instead, is a problem
> > related to some specific PCI hosts. On some x86 platforms, radeon/amdgpu
> > devices can cause the same problem, and commit faefba95c9e8ca3a523831c2e
> > ("drm/amdgpu: just suspend the hw on pci shutdown") can resolve it.
> >
> > Radeon driver is more difficult than amdgpu due to its confusing symbol
> > names, and I have maintained an out-of-tree patch for a long time [1].
> > Recently, we found more and more devices can cause the same problem, and
> > it is very difficult to modify all problematic drivers as radeon/amdgpu
> > does. So, I think modify the PCIe port driver is a simple and effective
> > way.
>
> Sounds plausible, but I don't know what the actual problem is, other
> than "poweroff/reboot doesn't work, and this patch 'fixes' it".
After all discussions (including those in the other thread), I think
the actual problem is "every driver should implement their shutdown
callback correctly as radeon/amdgpu does, otherwise there will be
poweroff failures because of device's BUS MASTER".

So, I think I should give two patches, the first remove the "remove"
callback, and the second just simply remove the "pci_disable_device()"
instead of modifying a ton of drivers, right?

Huacai
>
> > [1] https://github.com/chenhuacai/linux/commit/6612f9c1fc290d42a14618ce9a7d03014d8ebb1a
> >
> > Signed-off-by: Huacai Chen <chenhc@lemote.com>
> > ---
> >  drivers/pci/pcie/portdrv.h      |  2 +-
> >  drivers/pci/pcie/portdrv_core.c |  6 ++++--
> >  drivers/pci/pcie/portdrv_pci.c  | 15 +++++++++++++--
> >  3 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> > index af7cf23..22ba7b9 100644
> > --- a/drivers/pci/pcie/portdrv.h
> > +++ b/drivers/pci/pcie/portdrv.h
> > @@ -123,7 +123,7 @@ int pcie_port_device_resume(struct device *dev);
> >  int pcie_port_device_runtime_suspend(struct device *dev);
> >  int pcie_port_device_runtime_resume(struct device *dev);
> >  #endif
> > -void pcie_port_device_remove(struct pci_dev *dev);
> > +void pcie_port_device_remove(struct pci_dev *dev, bool disable);
> >  int __must_check pcie_port_bus_register(void);
> >  void pcie_port_bus_unregister(void);
> >
> > diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
> > index 50a9522..aa165be 100644
> > --- a/drivers/pci/pcie/portdrv_core.c
> > +++ b/drivers/pci/pcie/portdrv_core.c
> > @@ -487,11 +487,13 @@ EXPORT_SYMBOL_GPL(pcie_port_find_device);
> >   * Remove PCI Express port service devices associated with given port and
> >   * disable MSI-X or MSI for the port.
> >   */
> > -void pcie_port_device_remove(struct pci_dev *dev)
> > +void pcie_port_device_remove(struct pci_dev *dev, bool disable)
> >  {
> >       device_for_each_child(&dev->dev, NULL, remove_iter);
> >       pci_free_irq_vectors(dev);
> > -     pci_disable_device(dev);
> > +
> > +     if (disable)
> > +             pci_disable_device(dev);
> >  }
>
> portdrv can only be built statically (not as a module), and I don't
> think there's value in being able to unbind it.  So I think we should
> remove its .remove() method.  That would be a separate patch, of
> course, but it should make this one quite a bit simpler.
>
> >  /**
> > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
> > index 3acf151..fcd3d2e 100644
> > --- a/drivers/pci/pcie/portdrv_pci.c
> > +++ b/drivers/pci/pcie/portdrv_pci.c
> > @@ -142,7 +142,18 @@ static void pcie_portdrv_remove(struct pci_dev *dev)
> >               pm_runtime_dont_use_autosuspend(&dev->dev);
> >       }
> >
> > -     pcie_port_device_remove(dev);
> > +     pcie_port_device_remove(dev, true);
> > +}
> > +
> > +static void pcie_portdrv_shutdown(struct pci_dev *dev)
> > +{
> > +     if (pci_bridge_d3_possible(dev)) {
> > +             pm_runtime_forbid(&dev->dev);
> > +             pm_runtime_get_noresume(&dev->dev);
> > +             pm_runtime_dont_use_autosuspend(&dev->dev);
> > +     }
>
> I admit to being ignorant about how all this PM stuff works.  A
> comment here about what's going on and why we need to check
> pci_bridge_d3_possible() would be useful.
>
> > +     pcie_port_device_remove(dev, false);
> >  }
> >
> >  static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> > @@ -211,7 +222,7 @@ static struct pci_driver pcie_portdriver = {
> >
> >       .probe          = pcie_portdrv_probe,
> >       .remove         = pcie_portdrv_remove,
> > -     .shutdown       = pcie_portdrv_remove,
> > +     .shutdown       = pcie_portdrv_shutdown,
> >
> >       .err_handler    = &pcie_portdrv_err_handler,
> >
> > --
> > 2.7.0
> >
diff mbox series

Patch

diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index af7cf23..22ba7b9 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -123,7 +123,7 @@  int pcie_port_device_resume(struct device *dev);
 int pcie_port_device_runtime_suspend(struct device *dev);
 int pcie_port_device_runtime_resume(struct device *dev);
 #endif
-void pcie_port_device_remove(struct pci_dev *dev);
+void pcie_port_device_remove(struct pci_dev *dev, bool disable);
 int __must_check pcie_port_bus_register(void);
 void pcie_port_bus_unregister(void);
 
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 50a9522..aa165be 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -487,11 +487,13 @@  EXPORT_SYMBOL_GPL(pcie_port_find_device);
  * Remove PCI Express port service devices associated with given port and
  * disable MSI-X or MSI for the port.
  */
-void pcie_port_device_remove(struct pci_dev *dev)
+void pcie_port_device_remove(struct pci_dev *dev, bool disable)
 {
 	device_for_each_child(&dev->dev, NULL, remove_iter);
 	pci_free_irq_vectors(dev);
-	pci_disable_device(dev);
+
+	if (disable)
+		pci_disable_device(dev);
 }
 
 /**
diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 3acf151..fcd3d2e 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -142,7 +142,18 @@  static void pcie_portdrv_remove(struct pci_dev *dev)
 		pm_runtime_dont_use_autosuspend(&dev->dev);
 	}
 
-	pcie_port_device_remove(dev);
+	pcie_port_device_remove(dev, true);
+}
+
+static void pcie_portdrv_shutdown(struct pci_dev *dev)
+{
+	if (pci_bridge_d3_possible(dev)) {
+		pm_runtime_forbid(&dev->dev);
+		pm_runtime_get_noresume(&dev->dev);
+		pm_runtime_dont_use_autosuspend(&dev->dev);
+	}
+
+	pcie_port_device_remove(dev, false);
 }
 
 static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
@@ -211,7 +222,7 @@  static struct pci_driver pcie_portdriver = {
 
 	.probe		= pcie_portdrv_probe,
 	.remove		= pcie_portdrv_remove,
-	.shutdown	= pcie_portdrv_remove,
+	.shutdown	= pcie_portdrv_shutdown,
 
 	.err_handler	= &pcie_portdrv_err_handler,