ACPI: PM: Avoid attaching ACPI PM domain to certain devices
diff mbox series

Message ID 1773028.iBGNyVBcMc@kreacher
State Awaiting Upstream
Delegated to: Rafael Wysocki
Headers show
Series
  • ACPI: PM: Avoid attaching ACPI PM domain to certain devices
Related show

Commit Message

Rafael J. Wysocki Dec. 4, 2019, 1:54 a.m. UTC
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Certain ACPI-enumerated devices represented as platform devices in
Linux, like fans, require special low-level power management handling
implemented by their drivers that is not in agreement with the ACPI
PM domain behavior.  That leads to problems with managing ACPI fans
during system-wide suspend and resume.

For this reason, make acpi_dev_pm_attach() skip the affected devices
by adding a list of device IDs to avoid to it and putting the IDs of
the affected devices into that list.

Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines for subsystems)
Reported-by: Zhang Rui <rui.zhang@intel.com>
Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

Rui,

Please test this on the machine(s) affected by the fan suspend/resume issues.

I don't really see any cleaner way to address this problem, because the
ACPI PM domain should not be used with the devices in question even if
the driver that binds to them is not loaded.

Cheers,
Rafael

---
 drivers/acpi/device_pm.c |   12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Zhang Rui Dec. 4, 2019, 2:04 p.m. UTC | #1
On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Certain ACPI-enumerated devices represented as platform devices in
> Linux, like fans, require special low-level power management handling
> implemented by their drivers that is not in agreement with the ACPI
> PM domain behavior.  That leads to problems with managing ACPI fans
> during system-wide suspend and resume.
> 
> For this reason, make acpi_dev_pm_attach() skip the affected devices
> by adding a list of device IDs to avoid to it and putting the IDs of
> the affected devices into that list.
> 
> Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines for
> subsystems)
> Reported-by: Zhang Rui <rui.zhang@intel.com>
> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> Rui,
> 
> Please test this on the machine(s) affected by the fan suspend/resume
> issues.

Sure, Todd and I will re-run stress test with this patch applied when
5.5-rc1 released.

thanks,
rui

> 
> I don't really see any cleaner way to address this problem, because
> the
> ACPI PM domain should not be used with the devices in question even
> if
> the driver that binds to them is not loaded.
> 
> Cheers,
> Rafael
> 
> ---
>  drivers/acpi/device_pm.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/acpi/device_pm.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/device_pm.c
> +++ linux-pm/drivers/acpi/device_pm.c
> @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de
>   */
>  int acpi_dev_pm_attach(struct device *dev, bool power_on)
>  {
> +	/*
> +	 * Skip devices whose ACPI companions match the device IDs
> below,
> +	 * because they require special power management handling
> incompatible
> +	 * with the generic ACPI PM domain.
> +	 */
> +	static const struct acpi_device_id special_pm_ids[] = {
> +		{"PNP0C0B", }, /* Generic ACPI fan */
> +		{"INT3404", }, /* Fan */
> +		{}
> +	};
>  	struct acpi_device *adev = ACPI_COMPANION(dev);
>  
> -	if (!adev)
> +	if (!adev || !acpi_match_device_ids(adev, special_pm_ids))
>  		return 0;
>  
>  	/*
> 
> 
>
Todd Brandt Dec. 5, 2019, 5:32 p.m. UTC | #2
On Wed, 2019-12-04 at 22:04 +0800, Zhang Rui wrote:
> On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > 
> > Certain ACPI-enumerated devices represented as platform devices in
> > Linux, like fans, require special low-level power management
> > handling
> > implemented by their drivers that is not in agreement with the ACPI
> > PM domain behavior.  That leads to problems with managing ACPI fans
> > during system-wide suspend and resume.
> > 
> > For this reason, make acpi_dev_pm_attach() skip the affected
> > devices
> > by adding a list of device IDs to avoid to it and putting the IDs
> > of
> > the affected devices into that list.
> > 
> > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines
> > for
> > subsystems)
> > Reported-by: Zhang Rui <rui.zhang@intel.com>
> > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > 
> > Rui,
> > 
> > Please test this on the machine(s) affected by the fan
> > suspend/resume
> > issues.
> 
> Sure, Todd and I will re-run stress test with this patch applied when
> 5.5-rc1 released.

I've applied it 5.4.0 and will do a full stress test run this weekend
in the lab (where 7 out of 20 machines have this issue). The kernel
will be called "5.4.0-acpifanfix", and the data should be ready Sunday
Dec 8.

This is the issue I'll test for:
https://bugzilla.kernel.org/show_bug.cgi?id=204321

> 
> thanks,
> rui
> 
> > 
> > I don't really see any cleaner way to address this problem, because
> > the
> > ACPI PM domain should not be used with the devices in question even
> > if
> > the driver that binds to them is not loaded.
> > 
> > Cheers,
> > Rafael
> > 
> > ---
> >  drivers/acpi/device_pm.c |   12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > Index: linux-pm/drivers/acpi/device_pm.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/device_pm.c
> > +++ linux-pm/drivers/acpi/device_pm.c
> > @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de
> >   */
> >  int acpi_dev_pm_attach(struct device *dev, bool power_on)
> >  {
> > +	/*
> > +	 * Skip devices whose ACPI companions match the device IDs
> > below,
> > +	 * because they require special power management handling
> > incompatible
> > +	 * with the generic ACPI PM domain.
> > +	 */
> > +	static const struct acpi_device_id special_pm_ids[] = {
> > +		{"PNP0C0B", }, /* Generic ACPI fan */
> > +		{"INT3404", }, /* Fan */
> > +		{}
> > +	};
> >  	struct acpi_device *adev = ACPI_COMPANION(dev);
> >  
> > -	if (!adev)
> > +	if (!adev || !acpi_match_device_ids(adev, special_pm_ids))
> >  		return 0;
> >  
> >  	/*
> > 
> > 
> > 
> 
>
Todd Brandt Dec. 9, 2019, 10:34 p.m. UTC | #3
On Thu, 2019-12-05 at 09:32 -0800, Todd Brandt wrote:
> On Wed, 2019-12-04 at 22:04 +0800, Zhang Rui wrote:
> > On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > Certain ACPI-enumerated devices represented as platform devices
> > > in
> > > Linux, like fans, require special low-level power management
> > > handling
> > > implemented by their drivers that is not in agreement with the
> > > ACPI
> > > PM domain behavior.  That leads to problems with managing ACPI
> > > fans
> > > during system-wide suspend and resume.
> > > 
> > > For this reason, make acpi_dev_pm_attach() skip the affected
> > > devices
> > > by adding a list of device IDs to avoid to it and putting the IDs
> > > of
> > > the affected devices into that list.
> > > 
> > > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines
> > > for
> > > subsystems)
> > > Reported-by: Zhang Rui <rui.zhang@intel.com>
> > > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > > 
> > > Rui,
> > > 
> > > Please test this on the machine(s) affected by the fan
> > > suspend/resume
> > > issues.
> > 
> > Sure, Todd and I will re-run stress test with this patch applied
> > when
> > 5.5-rc1 released.
> 
> I've applied it 5.4.0 and will do a full stress test run this weekend
> in the lab (where 7 out of 20 machines have this issue). The kernel
> will be called "5.4.0-acpifanfix", and the data should be ready
> Sunday
> Dec 8.
> 
> This is the issue I'll test for:
> https://bugzilla.kernel.org/show_bug.cgi?id=204321
> 

The test data is in and I'm very happy to report that the patch works
extremely well. Here's the data gathered on the bugzilla issue 204321:
ACPI fan resume is too long.

[With the patch]

Kernel              Host                      Test Run    Count Rate
5.4.0-acpifanfix    otcpl-whl-u-clear         mem-x2351    1    0.04%
5.4.0-acpifanfix    otcpl-z170x-ud5           mem-x2369    0    0%
5.4.0-acpifanfix    otcpl-z170x-ud5           freeze-x2492 0    0%
5.4.0-acpifanfix    otcpl-whl-u-clear         freeze-x2411 0    0%
5.4.0-acpifanfix    otcpl-whl-u               mem-x2289    0    0%
5.4.0-acpifanfix    otcpl-whl-u               freeze-x2389 0    0%
5.4.0-acpifanfix    otcpl-icl-u-2             mem-x2273    0    0%
5.4.0-acpifanfix    otcpl-icl-u-2             freeze-x817  0    0%
5.4.0-acpifanfix    otcpl-glk-rvp-1           freeze-x122  0    0%
5.4.0-acpifanfix    otcpl-dell-p5510-xeon-1   mem-x1851    0    0%
5.4.0-acpifanfix    otcpl-dell-p5510-xeon-1   freeze-x1994 0    0%
5.4.0-acpifanfix    otcpl-dell-inspiron-3493  mem-x1722    0    0%
5.4.0-acpifanfix    otcpl-dell-inspiron-3493  freeze-x2102 0    0%
5.4.0-acpifanfix    otcpl-cfl-u-01            mem-x39      0    0%
5.4.0-acpifanfix    otcpl-cfl-u-01            freeze-x2091 0    0%
5.4.0-acpifanfix    otcpl-cfl-h               mem-x415     0    0%
5.4.0-acpifanfix    otcpl-cfl-h               freeze-x2265 0    0%
5.4.0-acpifanfix    otcpl-aml-y               mem-x3126    0    0%
5.4.0-acpifanfix    otcpl-aml-y               freeze-x2288 0    0%

[Without the patch]

Kernel    Host                      Test Run    Count Rate
5.3.0+    otcpl-icl-u-2             mem-x2571    2571 100.00%
5.3.0+    otcpl-whl-u               mem-x2497    2497 100.00%
5.3.0+    otcpl-cfl-h               mem-x2068    2068 100.00%
5.3.0+    otcpl-glk-rvp-1           mem-x75      74   98.67%
5.3.0+    otcpl-cfl-u-01            mem-x1074    1050 97.77%
5.3.0+    otcpl-glk-rvp-1           freeze-x45   12   26.67%
5.3.0+    otcpl-whl-u               freeze-x2649 428  16.16%
5.3.0+    otcpl-aml-y               freeze-x2434 373  15.32%
5.3.0+    otcpl-cfl-u-01            freeze-x2419 123  5.08%
5.3.0+    otcpl-latexo-ivb-cpt      freeze-x1914 97   5.07%
5.3.0+    otcpl-whl-u-clear         mem-x2640    69   2.61%
5.3.0+    otcpl-icl-u-2             freeze-x2757 59   2.14%
5.3.0+    otcpl-whl-u-clear         freeze-x2830 53   1.87%
5.3.0+    otcpl-tgl-rvp             freeze-x2086 20   0.96%
5.3.0+    otcpl-cfl-h               freeze-x2457 8    0.33%
5.3.0+    otcpl-latexo-ivb-cpt      mem-x2000    4    0.20%
5.3.0+    otcpl-aml-y               mem-x2727    2    0.07%
5.3.0+    otcpl-z170x-ud5           mem-x2669    0    0%    
5.3.0+    otcpl-z170x-ud5           freeze-x2881 0    0%    
5.3.0+    otcpl-lenovo-ideapad-130  mem-x2000    0    0%    
5.3.0+    otcpl-lenovo-ideapad-130  freeze-x2000 0    0%    
5.3.0+    otcpl-dell-p5510-xeon-2   mem-x570     0    0%    
5.3.0+    otcpl-dell-p5510-xeon-2   freeze-x1093 0    0%    
5.3.0+    otcpl-dell-p5510-xeon-1   mem-x2209    0    0%    
5.3.0+    otcpl-dell-p5510-xeon-1   freeze-x2431 0    0%    
5.3.0+    otcpl-dell-inspiron-3493  freeze-x1170 0    0%    
5.3.0+    otcpl-chromebook-hsw      freeze-x1305 0    0%    
5.3.0+    otcpl-chromebook-hsw      freeze-x368  0    0%    

> > 
> > thanks,
> > rui
> > 
> > > 
> > > I don't really see any cleaner way to address this problem,
> > > because
> > > the
> > > ACPI PM domain should not be used with the devices in question
> > > even
> > > if
> > > the driver that binds to them is not loaded.
> > > 
> > > Cheers,
> > > Rafael
> > > 
> > > ---
> > >  drivers/acpi/device_pm.c |   12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > Index: linux-pm/drivers/acpi/device_pm.c
> > > =================================================================
> > > ==
> > > --- linux-pm.orig/drivers/acpi/device_pm.c
> > > +++ linux-pm/drivers/acpi/device_pm.c
> > > @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de
> > >   */
> > >  int acpi_dev_pm_attach(struct device *dev, bool power_on)
> > >  {
> > > +	/*
> > > +	 * Skip devices whose ACPI companions match the device IDs
> > > below,
> > > +	 * because they require special power management handling
> > > incompatible
> > > +	 * with the generic ACPI PM domain.
> > > +	 */
> > > +	static const struct acpi_device_id special_pm_ids[] = {
> > > +		{"PNP0C0B", }, /* Generic ACPI fan */
> > > +		{"INT3404", }, /* Fan */
> > > +		{}
> > > +	};
> > >  	struct acpi_device *adev = ACPI_COMPANION(dev);
> > >  
> > > -	if (!adev)
> > > +	if (!adev || !acpi_match_device_ids(adev, special_pm_ids))
> > >  		return 0;
> > >  
> > >  	/*
> > > 
> > > 
> > > 
> > 
> >
Rafael J. Wysocki Dec. 9, 2019, 11:03 p.m. UTC | #4
On Mon, Dec 9, 2019 at 11:49 PM Todd Brandt
<todd.e.brandt@linux.intel.com> wrote:
>
> On Thu, 2019-12-05 at 09:32 -0800, Todd Brandt wrote:
> > On Wed, 2019-12-04 at 22:04 +0800, Zhang Rui wrote:
> > > On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > Certain ACPI-enumerated devices represented as platform devices
> > > > in
> > > > Linux, like fans, require special low-level power management
> > > > handling
> > > > implemented by their drivers that is not in agreement with the
> > > > ACPI
> > > > PM domain behavior.  That leads to problems with managing ACPI
> > > > fans
> > > > during system-wide suspend and resume.
> > > >
> > > > For this reason, make acpi_dev_pm_attach() skip the affected
> > > > devices
> > > > by adding a list of device IDs to avoid to it and putting the IDs
> > > > of
> > > > the affected devices into that list.
> > > >
> > > > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines
> > > > for
> > > > subsystems)
> > > > Reported-by: Zhang Rui <rui.zhang@intel.com>
> > > > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >
> > > > Rui,
> > > >
> > > > Please test this on the machine(s) affected by the fan
> > > > suspend/resume
> > > > issues.
> > >
> > > Sure, Todd and I will re-run stress test with this patch applied
> > > when
> > > 5.5-rc1 released.
> >
> > I've applied it 5.4.0 and will do a full stress test run this weekend
> > in the lab (where 7 out of 20 machines have this issue). The kernel
> > will be called "5.4.0-acpifanfix", and the data should be ready
> > Sunday
> > Dec 8.
> >
> > This is the issue I'll test for:
> > https://bugzilla.kernel.org/show_bug.cgi?id=204321
> >
>
> The test data is in and I'm very happy to report that the patch works
> extremely well. Here's the data gathered on the bugzilla issue 204321:
> ACPI fan resume is too long.
>
> [With the patch]
>
> Kernel              Host                      Test Run    Count Rate
> 5.4.0-acpifanfix    otcpl-whl-u-clear         mem-x2351    1    0.04%
> 5.4.0-acpifanfix    otcpl-z170x-ud5           mem-x2369    0    0%
> 5.4.0-acpifanfix    otcpl-z170x-ud5           freeze-x2492 0    0%
> 5.4.0-acpifanfix    otcpl-whl-u-clear         freeze-x2411 0    0%
> 5.4.0-acpifanfix    otcpl-whl-u               mem-x2289    0    0%
> 5.4.0-acpifanfix    otcpl-whl-u               freeze-x2389 0    0%
> 5.4.0-acpifanfix    otcpl-icl-u-2             mem-x2273    0    0%
> 5.4.0-acpifanfix    otcpl-icl-u-2             freeze-x817  0    0%
> 5.4.0-acpifanfix    otcpl-glk-rvp-1           freeze-x122  0    0%
> 5.4.0-acpifanfix    otcpl-dell-p5510-xeon-1   mem-x1851    0    0%
> 5.4.0-acpifanfix    otcpl-dell-p5510-xeon-1   freeze-x1994 0    0%
> 5.4.0-acpifanfix    otcpl-dell-inspiron-3493  mem-x1722    0    0%
> 5.4.0-acpifanfix    otcpl-dell-inspiron-3493  freeze-x2102 0    0%
> 5.4.0-acpifanfix    otcpl-cfl-u-01            mem-x39      0    0%
> 5.4.0-acpifanfix    otcpl-cfl-u-01            freeze-x2091 0    0%
> 5.4.0-acpifanfix    otcpl-cfl-h               mem-x415     0    0%
> 5.4.0-acpifanfix    otcpl-cfl-h               freeze-x2265 0    0%
> 5.4.0-acpifanfix    otcpl-aml-y               mem-x3126    0    0%
> 5.4.0-acpifanfix    otcpl-aml-y               freeze-x2288 0    0%
>
> [Without the patch]
>
> Kernel    Host                      Test Run    Count Rate
> 5.3.0+    otcpl-icl-u-2             mem-x2571    2571 100.00%
> 5.3.0+    otcpl-whl-u               mem-x2497    2497 100.00%
> 5.3.0+    otcpl-cfl-h               mem-x2068    2068 100.00%
> 5.3.0+    otcpl-glk-rvp-1           mem-x75      74   98.67%
> 5.3.0+    otcpl-cfl-u-01            mem-x1074    1050 97.77%
> 5.3.0+    otcpl-glk-rvp-1           freeze-x45   12   26.67%
> 5.3.0+    otcpl-whl-u               freeze-x2649 428  16.16%
> 5.3.0+    otcpl-aml-y               freeze-x2434 373  15.32%
> 5.3.0+    otcpl-cfl-u-01            freeze-x2419 123  5.08%
> 5.3.0+    otcpl-latexo-ivb-cpt      freeze-x1914 97   5.07%
> 5.3.0+    otcpl-whl-u-clear         mem-x2640    69   2.61%
> 5.3.0+    otcpl-icl-u-2             freeze-x2757 59   2.14%
> 5.3.0+    otcpl-whl-u-clear         freeze-x2830 53   1.87%
> 5.3.0+    otcpl-tgl-rvp             freeze-x2086 20   0.96%
> 5.3.0+    otcpl-cfl-h               freeze-x2457 8    0.33%
> 5.3.0+    otcpl-latexo-ivb-cpt      mem-x2000    4    0.20%
> 5.3.0+    otcpl-aml-y               mem-x2727    2    0.07%
> 5.3.0+    otcpl-z170x-ud5           mem-x2669    0    0%
> 5.3.0+    otcpl-z170x-ud5           freeze-x2881 0    0%
> 5.3.0+    otcpl-lenovo-ideapad-130  mem-x2000    0    0%
> 5.3.0+    otcpl-lenovo-ideapad-130  freeze-x2000 0    0%
> 5.3.0+    otcpl-dell-p5510-xeon-2   mem-x570     0    0%
> 5.3.0+    otcpl-dell-p5510-xeon-2   freeze-x1093 0    0%
> 5.3.0+    otcpl-dell-p5510-xeon-1   mem-x2209    0    0%
> 5.3.0+    otcpl-dell-p5510-xeon-1   freeze-x2431 0    0%
> 5.3.0+    otcpl-dell-inspiron-3493  freeze-x1170 0    0%
> 5.3.0+    otcpl-chromebook-hsw      freeze-x1305 0    0%
> 5.3.0+    otcpl-chromebook-hsw      freeze-x368  0    0%

Great, thanks for the verification!

I will queue up the patch for the next ACPI pull request, then.

Thanks!

Patch
diff mbox series

Index: linux-pm/drivers/acpi/device_pm.c
===================================================================
--- linux-pm.orig/drivers/acpi/device_pm.c
+++ linux-pm/drivers/acpi/device_pm.c
@@ -1314,9 +1314,19 @@  static void acpi_dev_pm_detach(struct de
  */
 int acpi_dev_pm_attach(struct device *dev, bool power_on)
 {
+	/*
+	 * Skip devices whose ACPI companions match the device IDs below,
+	 * because they require special power management handling incompatible
+	 * with the generic ACPI PM domain.
+	 */
+	static const struct acpi_device_id special_pm_ids[] = {
+		{"PNP0C0B", }, /* Generic ACPI fan */
+		{"INT3404", }, /* Fan */
+		{}
+	};
 	struct acpi_device *adev = ACPI_COMPANION(dev);
 
-	if (!adev)
+	if (!adev || !acpi_match_device_ids(adev, special_pm_ids))
 		return 0;
 
 	/*