diff mbox

PM/ runtime: fix resume from suspend on newer hp zbook/elitebook

Message ID 20180515133508.2913-1-kugel@rockbox.org (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Thomas Martitz May 15, 2018, 1:35 p.m. UTC
In 08810a4119aaebf6318f209ec5dd9828e969cba4 setting
dev->power.direct_complete was made conditional on
pm_runtime_suspended().

The justification was:
    While at it, make the core check pm_runtime_suspended() when
    setting power.direct_complete so that it doesn't need to be
    checked by ->prepare callbacks.

However, this breaks resuming from suspend on those newer HP laptops
if the amdgpu driver is used (due to hybrid intel+radeon graphics). Given
the justification for the change, undoing it seems best as it
appears to have unintended side effects.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199693
References: https://bugs.freedesktop.org/show_bug.cgi?id=106447
Signed-off-by: Thomas Martitz <kugel@rockbox.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Len Brown <len.brown@intel.com>
Cc: <linux-pm@vger.kernel.org>
Cc: <stable@vger.kernel.org> [4.15+]
Signed-off-by: Thomas Martitz <kugel@rockbox.org>
---
 drivers/base/power/main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Rafael J. Wysocki May 15, 2018, 3:10 p.m. UTC | #1
On Tue, May 15, 2018 at 3:35 PM, Thomas Martitz <kugel@rockbox.org> wrote:
> In 08810a4119aaebf6318f209ec5dd9828e969cba4 setting
> dev->power.direct_complete was made conditional on
> pm_runtime_suspended().
>
> The justification was:
>     While at it, make the core check pm_runtime_suspended() when
>     setting power.direct_complete so that it doesn't need to be
>     checked by ->prepare callbacks.
>
> However, this breaks resuming from suspend on those newer HP laptops
> if the amdgpu driver is used (due to hybrid intel+radeon graphics). Given
> the justification for the change, undoing it seems best as it
> appears to have unintended side effects.
>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199693
> References: https://bugs.freedesktop.org/show_bug.cgi?id=106447
> Signed-off-by: Thomas Martitz <kugel@rockbox.org>
> Cc: Pavel Machek <pavel@ucw.cz>
> Cc: Len Brown <len.brown@intel.com>
> Cc: <linux-pm@vger.kernel.org>
> Cc: <stable@vger.kernel.org> [4.15+]
> Signed-off-by: Thomas Martitz <kugel@rockbox.org>
> ---
>  drivers/base/power/main.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index 02a497e7c785..b2fb0974f832 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -1960,8 +1960,7 @@ static int device_prepare(struct device *dev, pm_message_t state)
>          */
>         spin_lock_irq(&dev->power.lock);
>         dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
> -               pm_runtime_suspended(dev) && ret > 0 &&
> -               !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
> +               ret > 0 && !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
>         spin_unlock_irq(&dev->power.lock);
>         return 0;
>  }
> --

Before applying anything like this I need to understand the failure in
the first place.

Since direct_complete is optional anyway and it should be safe to call
pm_runtime_suspended() at any time, I'm suspecting a bug in a driver
that is exposed by the commit turned up by bisection.

Where's the code I need to look at?
Thomas Martitz May 15, 2018, 7:39 p.m. UTC | #2
Am 15.05.2018 um 17:10 schrieb Rafael J. Wysocki:

> Before applying anything like this I need to understand the failure in
> the first place.
> 
> Since direct_complete is optional anyway and it should be safe to call
> pm_runtime_suspended() at any time, I'm suspecting a bug in a driver
> that is exposed by the commit turned up by bisection.
> 
> Where's the code I need to look at?
> 

Hello,

thanks for looking into this. To answer the question I need your 
expertise. I couldn't find out which device causes it. Since the freeze 
only happens of amdgpu is loaded I naturally thought the problematic 
device(s) would be bound to amdgpu. So I assumed the amdgpu-bound 
devices would be affected by your change, i.e. not have direct_complete 
set anymore. But I could not confirm this evidence during testing (I 
basically restricted my patch to devices which pass 
!strcmp(dev->driver->name, "amdgpu", and the system was still frozen 
upon resume).

Then I printed a list which pass "dev->direct_complete && 
!pm_runtime_suspended(dev)" (with this patch applied), and the list was 
*very* long (maybe 100+). I can perform this again if you find the list 
useful.

The sheer number of devices which pass  "dev->direct_complete && 
!pm_runtime_suspended(dev)" made me think that your change should be 
re-considered, hence my patch.

So, please give me advice on how I can point you to the code that you'd 
like to look at. I assume key information which device(s) pass 
"dev->direct_complete && !pm_runtime_suspended(dev)" AND cause the 
freeze on !dev->direct_complete? I'm afraid that will take a very long 
time to find out.

Best regards.
Rafael J. Wysocki May 18, 2018, 10:26 a.m. UTC | #3
On Tue, May 15, 2018 at 9:39 PM, Thomas Martitz <kugel@rockbox.org> wrote:
> Am 15.05.2018 um 17:10 schrieb Rafael J. Wysocki:
>
>> Before applying anything like this I need to understand the failure in
>> the first place.
>>
>> Since direct_complete is optional anyway and it should be safe to call
>> pm_runtime_suspended() at any time, I'm suspecting a bug in a driver
>> that is exposed by the commit turned up by bisection.
>>
>> Where's the code I need to look at?
>>
>
> Hello,
>
> thanks for looking into this. To answer the question I need your expertise.
> I couldn't find out which device causes it. Since the freeze only happens of
> amdgpu is loaded I naturally thought the problematic device(s) would be
> bound to amdgpu. So I assumed the amdgpu-bound devices would be affected by
> your change, i.e. not have direct_complete set anymore. But I could not
> confirm this evidence during testing (I basically restricted my patch to
> devices which pass !strcmp(dev->driver->name, "amdgpu", and the system was
> still frozen upon resume).
>
> Then I printed a list which pass "dev->direct_complete &&
> !pm_runtime_suspended(dev)" (with this patch applied), and the list was
> *very* long (maybe 100+). I can perform this again if you find the list
> useful.
>
> The sheer number of devices which pass  "dev->direct_complete &&
> !pm_runtime_suspended(dev)" made me think that your change should be
> re-considered, hence my patch.
>
> So, please give me advice on how I can point you to the code that you'd like
> to look at. I assume key information which device(s) pass
> "dev->direct_complete && !pm_runtime_suspended(dev)" AND cause the freeze on
> !dev->direct_complete? I'm afraid that will take a very long time to find
> out.

Let's continue this in the Bugzilla entry at

https://bugzilla.kernel.org/show_bug.cgi?id=199693

Thanks,
Rafael
diff mbox

Patch

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 02a497e7c785..b2fb0974f832 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -1960,8 +1960,7 @@  static int device_prepare(struct device *dev, pm_message_t state)
 	 */
 	spin_lock_irq(&dev->power.lock);
 	dev->power.direct_complete = state.event == PM_EVENT_SUSPEND &&
-		pm_runtime_suspended(dev) && ret > 0 &&
-		!dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
+		ret > 0 && !dev_pm_test_driver_flags(dev, DPM_FLAG_NEVER_SKIP);
 	spin_unlock_irq(&dev->power.lock);
 	return 0;
 }