Message ID | 1773028.iBGNyVBcMc@kreacher (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | ACPI: PM: Avoid attaching ACPI PM domain to certain devices | expand |
On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Certain ACPI-enumerated devices represented as platform devices in > Linux, like fans, require special low-level power management handling > implemented by their drivers that is not in agreement with the ACPI > PM domain behavior. That leads to problems with managing ACPI fans > during system-wide suspend and resume. > > For this reason, make acpi_dev_pm_attach() skip the affected devices > by adding a list of device IDs to avoid to it and putting the IDs of > the affected devices into that list. > > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines for > subsystems) > Reported-by: Zhang Rui <rui.zhang@intel.com> > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > --- > > Rui, > > Please test this on the machine(s) affected by the fan suspend/resume > issues. Sure, Todd and I will re-run stress test with this patch applied when 5.5-rc1 released. thanks, rui > > I don't really see any cleaner way to address this problem, because > the > ACPI PM domain should not be used with the devices in question even > if > the driver that binds to them is not loaded. > > Cheers, > Rafael > > --- > drivers/acpi/device_pm.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > Index: linux-pm/drivers/acpi/device_pm.c > =================================================================== > --- linux-pm.orig/drivers/acpi/device_pm.c > +++ linux-pm/drivers/acpi/device_pm.c > @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de > */ > int acpi_dev_pm_attach(struct device *dev, bool power_on) > { > + /* > + * Skip devices whose ACPI companions match the device IDs > below, > + * because they require special power management handling > incompatible > + * with the generic ACPI PM domain. > + */ > + static const struct acpi_device_id special_pm_ids[] = { > + {"PNP0C0B", }, /* Generic ACPI fan */ > + {"INT3404", }, /* Fan */ > + {} > + }; > struct acpi_device *adev = ACPI_COMPANION(dev); > > - if (!adev) > + if (!adev || !acpi_match_device_ids(adev, special_pm_ids)) > return 0; > > /* > > >
On Wed, 2019-12-04 at 22:04 +0800, Zhang Rui wrote: > On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > Certain ACPI-enumerated devices represented as platform devices in > > Linux, like fans, require special low-level power management > > handling > > implemented by their drivers that is not in agreement with the ACPI > > PM domain behavior. That leads to problems with managing ACPI fans > > during system-wide suspend and resume. > > > > For this reason, make acpi_dev_pm_attach() skip the affected > > devices > > by adding a list of device IDs to avoid to it and putting the IDs > > of > > the affected devices into that list. > > > > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines > > for > > subsystems) > > Reported-by: Zhang Rui <rui.zhang@intel.com> > > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > --- > > > > Rui, > > > > Please test this on the machine(s) affected by the fan > > suspend/resume > > issues. > > Sure, Todd and I will re-run stress test with this patch applied when > 5.5-rc1 released. I've applied it 5.4.0 and will do a full stress test run this weekend in the lab (where 7 out of 20 machines have this issue). The kernel will be called "5.4.0-acpifanfix", and the data should be ready Sunday Dec 8. This is the issue I'll test for: https://bugzilla.kernel.org/show_bug.cgi?id=204321 > > thanks, > rui > > > > > I don't really see any cleaner way to address this problem, because > > the > > ACPI PM domain should not be used with the devices in question even > > if > > the driver that binds to them is not loaded. > > > > Cheers, > > Rafael > > > > --- > > drivers/acpi/device_pm.c | 12 +++++++++++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > Index: linux-pm/drivers/acpi/device_pm.c > > =================================================================== > > --- linux-pm.orig/drivers/acpi/device_pm.c > > +++ linux-pm/drivers/acpi/device_pm.c > > @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de > > */ > > int acpi_dev_pm_attach(struct device *dev, bool power_on) > > { > > + /* > > + * Skip devices whose ACPI companions match the device IDs > > below, > > + * because they require special power management handling > > incompatible > > + * with the generic ACPI PM domain. > > + */ > > + static const struct acpi_device_id special_pm_ids[] = { > > + {"PNP0C0B", }, /* Generic ACPI fan */ > > + {"INT3404", }, /* Fan */ > > + {} > > + }; > > struct acpi_device *adev = ACPI_COMPANION(dev); > > > > - if (!adev) > > + if (!adev || !acpi_match_device_ids(adev, special_pm_ids)) > > return 0; > > > > /* > > > > > > > >
On Thu, 2019-12-05 at 09:32 -0800, Todd Brandt wrote: > On Wed, 2019-12-04 at 22:04 +0800, Zhang Rui wrote: > > On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote: > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > Certain ACPI-enumerated devices represented as platform devices > > > in > > > Linux, like fans, require special low-level power management > > > handling > > > implemented by their drivers that is not in agreement with the > > > ACPI > > > PM domain behavior. That leads to problems with managing ACPI > > > fans > > > during system-wide suspend and resume. > > > > > > For this reason, make acpi_dev_pm_attach() skip the affected > > > devices > > > by adding a list of device IDs to avoid to it and putting the IDs > > > of > > > the affected devices into that list. > > > > > > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines > > > for > > > subsystems) > > > Reported-by: Zhang Rui <rui.zhang@intel.com> > > > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > --- > > > > > > Rui, > > > > > > Please test this on the machine(s) affected by the fan > > > suspend/resume > > > issues. > > > > Sure, Todd and I will re-run stress test with this patch applied > > when > > 5.5-rc1 released. > > I've applied it 5.4.0 and will do a full stress test run this weekend > in the lab (where 7 out of 20 machines have this issue). The kernel > will be called "5.4.0-acpifanfix", and the data should be ready > Sunday > Dec 8. > > This is the issue I'll test for: > https://bugzilla.kernel.org/show_bug.cgi?id=204321 > The test data is in and I'm very happy to report that the patch works extremely well. Here's the data gathered on the bugzilla issue 204321: ACPI fan resume is too long. [With the patch] Kernel Host Test Run Count Rate 5.4.0-acpifanfix otcpl-whl-u-clear mem-x2351 1 0.04% 5.4.0-acpifanfix otcpl-z170x-ud5 mem-x2369 0 0% 5.4.0-acpifanfix otcpl-z170x-ud5 freeze-x2492 0 0% 5.4.0-acpifanfix otcpl-whl-u-clear freeze-x2411 0 0% 5.4.0-acpifanfix otcpl-whl-u mem-x2289 0 0% 5.4.0-acpifanfix otcpl-whl-u freeze-x2389 0 0% 5.4.0-acpifanfix otcpl-icl-u-2 mem-x2273 0 0% 5.4.0-acpifanfix otcpl-icl-u-2 freeze-x817 0 0% 5.4.0-acpifanfix otcpl-glk-rvp-1 freeze-x122 0 0% 5.4.0-acpifanfix otcpl-dell-p5510-xeon-1 mem-x1851 0 0% 5.4.0-acpifanfix otcpl-dell-p5510-xeon-1 freeze-x1994 0 0% 5.4.0-acpifanfix otcpl-dell-inspiron-3493 mem-x1722 0 0% 5.4.0-acpifanfix otcpl-dell-inspiron-3493 freeze-x2102 0 0% 5.4.0-acpifanfix otcpl-cfl-u-01 mem-x39 0 0% 5.4.0-acpifanfix otcpl-cfl-u-01 freeze-x2091 0 0% 5.4.0-acpifanfix otcpl-cfl-h mem-x415 0 0% 5.4.0-acpifanfix otcpl-cfl-h freeze-x2265 0 0% 5.4.0-acpifanfix otcpl-aml-y mem-x3126 0 0% 5.4.0-acpifanfix otcpl-aml-y freeze-x2288 0 0% [Without the patch] Kernel Host Test Run Count Rate 5.3.0+ otcpl-icl-u-2 mem-x2571 2571 100.00% 5.3.0+ otcpl-whl-u mem-x2497 2497 100.00% 5.3.0+ otcpl-cfl-h mem-x2068 2068 100.00% 5.3.0+ otcpl-glk-rvp-1 mem-x75 74 98.67% 5.3.0+ otcpl-cfl-u-01 mem-x1074 1050 97.77% 5.3.0+ otcpl-glk-rvp-1 freeze-x45 12 26.67% 5.3.0+ otcpl-whl-u freeze-x2649 428 16.16% 5.3.0+ otcpl-aml-y freeze-x2434 373 15.32% 5.3.0+ otcpl-cfl-u-01 freeze-x2419 123 5.08% 5.3.0+ otcpl-latexo-ivb-cpt freeze-x1914 97 5.07% 5.3.0+ otcpl-whl-u-clear mem-x2640 69 2.61% 5.3.0+ otcpl-icl-u-2 freeze-x2757 59 2.14% 5.3.0+ otcpl-whl-u-clear freeze-x2830 53 1.87% 5.3.0+ otcpl-tgl-rvp freeze-x2086 20 0.96% 5.3.0+ otcpl-cfl-h freeze-x2457 8 0.33% 5.3.0+ otcpl-latexo-ivb-cpt mem-x2000 4 0.20% 5.3.0+ otcpl-aml-y mem-x2727 2 0.07% 5.3.0+ otcpl-z170x-ud5 mem-x2669 0 0% 5.3.0+ otcpl-z170x-ud5 freeze-x2881 0 0% 5.3.0+ otcpl-lenovo-ideapad-130 mem-x2000 0 0% 5.3.0+ otcpl-lenovo-ideapad-130 freeze-x2000 0 0% 5.3.0+ otcpl-dell-p5510-xeon-2 mem-x570 0 0% 5.3.0+ otcpl-dell-p5510-xeon-2 freeze-x1093 0 0% 5.3.0+ otcpl-dell-p5510-xeon-1 mem-x2209 0 0% 5.3.0+ otcpl-dell-p5510-xeon-1 freeze-x2431 0 0% 5.3.0+ otcpl-dell-inspiron-3493 freeze-x1170 0 0% 5.3.0+ otcpl-chromebook-hsw freeze-x1305 0 0% 5.3.0+ otcpl-chromebook-hsw freeze-x368 0 0% > > > > thanks, > > rui > > > > > > > > I don't really see any cleaner way to address this problem, > > > because > > > the > > > ACPI PM domain should not be used with the devices in question > > > even > > > if > > > the driver that binds to them is not loaded. > > > > > > Cheers, > > > Rafael > > > > > > --- > > > drivers/acpi/device_pm.c | 12 +++++++++++- > > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > > > Index: linux-pm/drivers/acpi/device_pm.c > > > ================================================================= > > > == > > > --- linux-pm.orig/drivers/acpi/device_pm.c > > > +++ linux-pm/drivers/acpi/device_pm.c > > > @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de > > > */ > > > int acpi_dev_pm_attach(struct device *dev, bool power_on) > > > { > > > + /* > > > + * Skip devices whose ACPI companions match the device IDs > > > below, > > > + * because they require special power management handling > > > incompatible > > > + * with the generic ACPI PM domain. > > > + */ > > > + static const struct acpi_device_id special_pm_ids[] = { > > > + {"PNP0C0B", }, /* Generic ACPI fan */ > > > + {"INT3404", }, /* Fan */ > > > + {} > > > + }; > > > struct acpi_device *adev = ACPI_COMPANION(dev); > > > > > > - if (!adev) > > > + if (!adev || !acpi_match_device_ids(adev, special_pm_ids)) > > > return 0; > > > > > > /* > > > > > > > > > > > > >
On Mon, Dec 9, 2019 at 11:49 PM Todd Brandt <todd.e.brandt@linux.intel.com> wrote: > > On Thu, 2019-12-05 at 09:32 -0800, Todd Brandt wrote: > > On Wed, 2019-12-04 at 22:04 +0800, Zhang Rui wrote: > > > On Wed, 2019-12-04 at 02:54 +0100, Rafael J. Wysocki wrote: > > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > > > > > Certain ACPI-enumerated devices represented as platform devices > > > > in > > > > Linux, like fans, require special low-level power management > > > > handling > > > > implemented by their drivers that is not in agreement with the > > > > ACPI > > > > PM domain behavior. That leads to problems with managing ACPI > > > > fans > > > > during system-wide suspend and resume. > > > > > > > > For this reason, make acpi_dev_pm_attach() skip the affected > > > > devices > > > > by adding a list of device IDs to avoid to it and putting the IDs > > > > of > > > > the affected devices into that list. > > > > > > > > Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines > > > > for > > > > subsystems) > > > > Reported-by: Zhang Rui <rui.zhang@intel.com> > > > > Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ > > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > > > --- > > > > > > > > Rui, > > > > > > > > Please test this on the machine(s) affected by the fan > > > > suspend/resume > > > > issues. > > > > > > Sure, Todd and I will re-run stress test with this patch applied > > > when > > > 5.5-rc1 released. > > > > I've applied it 5.4.0 and will do a full stress test run this weekend > > in the lab (where 7 out of 20 machines have this issue). The kernel > > will be called "5.4.0-acpifanfix", and the data should be ready > > Sunday > > Dec 8. > > > > This is the issue I'll test for: > > https://bugzilla.kernel.org/show_bug.cgi?id=204321 > > > > The test data is in and I'm very happy to report that the patch works > extremely well. Here's the data gathered on the bugzilla issue 204321: > ACPI fan resume is too long. > > [With the patch] > > Kernel Host Test Run Count Rate > 5.4.0-acpifanfix otcpl-whl-u-clear mem-x2351 1 0.04% > 5.4.0-acpifanfix otcpl-z170x-ud5 mem-x2369 0 0% > 5.4.0-acpifanfix otcpl-z170x-ud5 freeze-x2492 0 0% > 5.4.0-acpifanfix otcpl-whl-u-clear freeze-x2411 0 0% > 5.4.0-acpifanfix otcpl-whl-u mem-x2289 0 0% > 5.4.0-acpifanfix otcpl-whl-u freeze-x2389 0 0% > 5.4.0-acpifanfix otcpl-icl-u-2 mem-x2273 0 0% > 5.4.0-acpifanfix otcpl-icl-u-2 freeze-x817 0 0% > 5.4.0-acpifanfix otcpl-glk-rvp-1 freeze-x122 0 0% > 5.4.0-acpifanfix otcpl-dell-p5510-xeon-1 mem-x1851 0 0% > 5.4.0-acpifanfix otcpl-dell-p5510-xeon-1 freeze-x1994 0 0% > 5.4.0-acpifanfix otcpl-dell-inspiron-3493 mem-x1722 0 0% > 5.4.0-acpifanfix otcpl-dell-inspiron-3493 freeze-x2102 0 0% > 5.4.0-acpifanfix otcpl-cfl-u-01 mem-x39 0 0% > 5.4.0-acpifanfix otcpl-cfl-u-01 freeze-x2091 0 0% > 5.4.0-acpifanfix otcpl-cfl-h mem-x415 0 0% > 5.4.0-acpifanfix otcpl-cfl-h freeze-x2265 0 0% > 5.4.0-acpifanfix otcpl-aml-y mem-x3126 0 0% > 5.4.0-acpifanfix otcpl-aml-y freeze-x2288 0 0% > > [Without the patch] > > Kernel Host Test Run Count Rate > 5.3.0+ otcpl-icl-u-2 mem-x2571 2571 100.00% > 5.3.0+ otcpl-whl-u mem-x2497 2497 100.00% > 5.3.0+ otcpl-cfl-h mem-x2068 2068 100.00% > 5.3.0+ otcpl-glk-rvp-1 mem-x75 74 98.67% > 5.3.0+ otcpl-cfl-u-01 mem-x1074 1050 97.77% > 5.3.0+ otcpl-glk-rvp-1 freeze-x45 12 26.67% > 5.3.0+ otcpl-whl-u freeze-x2649 428 16.16% > 5.3.0+ otcpl-aml-y freeze-x2434 373 15.32% > 5.3.0+ otcpl-cfl-u-01 freeze-x2419 123 5.08% > 5.3.0+ otcpl-latexo-ivb-cpt freeze-x1914 97 5.07% > 5.3.0+ otcpl-whl-u-clear mem-x2640 69 2.61% > 5.3.0+ otcpl-icl-u-2 freeze-x2757 59 2.14% > 5.3.0+ otcpl-whl-u-clear freeze-x2830 53 1.87% > 5.3.0+ otcpl-tgl-rvp freeze-x2086 20 0.96% > 5.3.0+ otcpl-cfl-h freeze-x2457 8 0.33% > 5.3.0+ otcpl-latexo-ivb-cpt mem-x2000 4 0.20% > 5.3.0+ otcpl-aml-y mem-x2727 2 0.07% > 5.3.0+ otcpl-z170x-ud5 mem-x2669 0 0% > 5.3.0+ otcpl-z170x-ud5 freeze-x2881 0 0% > 5.3.0+ otcpl-lenovo-ideapad-130 mem-x2000 0 0% > 5.3.0+ otcpl-lenovo-ideapad-130 freeze-x2000 0 0% > 5.3.0+ otcpl-dell-p5510-xeon-2 mem-x570 0 0% > 5.3.0+ otcpl-dell-p5510-xeon-2 freeze-x1093 0 0% > 5.3.0+ otcpl-dell-p5510-xeon-1 mem-x2209 0 0% > 5.3.0+ otcpl-dell-p5510-xeon-1 freeze-x2431 0 0% > 5.3.0+ otcpl-dell-inspiron-3493 freeze-x1170 0 0% > 5.3.0+ otcpl-chromebook-hsw freeze-x1305 0 0% > 5.3.0+ otcpl-chromebook-hsw freeze-x368 0 0% Great, thanks for the verification! I will queue up the patch for the next ACPI pull request, then. Thanks!
Index: linux-pm/drivers/acpi/device_pm.c =================================================================== --- linux-pm.orig/drivers/acpi/device_pm.c +++ linux-pm/drivers/acpi/device_pm.c @@ -1314,9 +1314,19 @@ static void acpi_dev_pm_detach(struct de */ int acpi_dev_pm_attach(struct device *dev, bool power_on) { + /* + * Skip devices whose ACPI companions match the device IDs below, + * because they require special power management handling incompatible + * with the generic ACPI PM domain. + */ + static const struct acpi_device_id special_pm_ids[] = { + {"PNP0C0B", }, /* Generic ACPI fan */ + {"INT3404", }, /* Fan */ + {} + }; struct acpi_device *adev = ACPI_COMPANION(dev); - if (!adev) + if (!adev || !acpi_match_device_ids(adev, special_pm_ids)) return 0; /*