diff mbox

[v2] drm/i915: gen4: work around hang during hibernation

Message ID 1426671436.2649.19.camel@tiscali.nl (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Bolle March 18, 2015, 9:37 a.m. UTC
Imre Deak schreef op ma 02-03-2015 om 13:04 [+0200]:
> Bjørn reported that his machine hang during hibernation and eventually
> bisected the problem to the following commit:
> 
> commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
> Author: Imre Deak <imre.deak@intel.com>
> Date:   Thu Oct 23 19:23:26 2014 +0300
> 
>     drm/i915: add poweroff_late handler
> 
> The problem seems to be that after the kernel puts the device into D3
> the BIOS still tries to access it, or otherwise assumes that it's in D0.
> This is clearly bogus, since ACPI mandates that devices are put into D3
> by the OSPM if they are not wake-up sources. In the future we want to
> unify more of the driver's runtime and system suspend paths, for example
> by skipping all the system suspend/hibernation hooks if the device is
> runtime suspended already. Accordingly for all other platforms the goal
> is still to properly power down the device during hibernation.
> 
> v2:
> - Another GEN4 Lenovo laptop had the same issue, while platforms from
>   other vendors (including mobile and desktop, GEN4 and non-GEN4) seem
>   to work fine. Based on this apply the workaround on all GEN4 Lenovo
>   platforms.
> - add code comment about failing platforms (Ville)

The outdated ThinkPad X41 that I torture by running rc's showed
identical symptoms, also since v3.19-rc1. It uses a gen3 chipset (it has
a 915GM, I think, but I keep forgetting details like that).

I did everything wrong to get this fixed (1: hope this gets magically
fixed; 2: bisect it myself, thinking every now and then that I know
better than git bisect which commit to choose; 3: finally grep lkml). So
here I am late to the show.

> Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
> Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no>
> Signed-off-by: Imre Deak <imre.deak@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4badb23..ff3662f 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev)
>  	return 0;
>  }
>  
> -static int i915_drm_suspend_late(struct drm_device *drm_dev)
> +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
>  {
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
>  	int ret;
> @@ -651,7 +651,17 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
>  	}
>  
>  	pci_disable_device(drm_dev->pdev);
> -	pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> +	/*
> +	 * During hibernation on some GEN4 platforms the BIOS may try to access
> +	 * the device even though it's already in D3 and hang the machine. So
> +	 * leave the device in D0 on those platforms and hope the BIOS will
> +	 * power down the device properly. Platforms where this was seen:
> +	 * Lenovo Thinkpad X301, X61s
> +	 */
> +	if (!(hibernation &&
> +	      drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
> +	      INTEL_INFO(dev_priv)->gen == 4))
> +		pci_set_power_state(drm_dev->pdev, PCI_D3hot);
>  
>  	return 0;
>  }

I'll paste a DRAFT patch that fixes this for that X41 at the end of the
message. The patch is rather ugly. Should we perhaps try a quirk table
or something like that?


Paul Bolle

-------->8--------
Subject: [PATCH] drm/i915: work around hang during hibernation on gen3 too

Commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during
hibernation") was targetted at gen4 platforms shipped by Lenovo. The
same problem can also be seen on a Lenovo ThinkPad X41. Expand the test
to catch that system too.

Sadly, this system still uses IBM's subsystem vendor id. So we end up
with a rather unpleasant test. Use the IS_GEN3() and IS_GEN4() macros to
lessen the pain a bit.

Not-yet-signed-off-by: Paul Bolle <pebolle@tiscali.nl>
---
 drivers/gpu/drm/i915/i915_drv.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

Comments

Ville Syrjälä March 18, 2015, 10:22 a.m. UTC | #1
On Wed, Mar 18, 2015 at 10:37:16AM +0100, Paul Bolle wrote:
> Imre Deak schreef op ma 02-03-2015 om 13:04 [+0200]:
> > Bjørn reported that his machine hang during hibernation and eventually
> > bisected the problem to the following commit:
> > 
> > commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
> > Author: Imre Deak <imre.deak@intel.com>
> > Date:   Thu Oct 23 19:23:26 2014 +0300
> > 
> >     drm/i915: add poweroff_late handler
> > 
> > The problem seems to be that after the kernel puts the device into D3
> > the BIOS still tries to access it, or otherwise assumes that it's in D0.
> > This is clearly bogus, since ACPI mandates that devices are put into D3
> > by the OSPM if they are not wake-up sources. In the future we want to
> > unify more of the driver's runtime and system suspend paths, for example
> > by skipping all the system suspend/hibernation hooks if the device is
> > runtime suspended already. Accordingly for all other platforms the goal
> > is still to properly power down the device during hibernation.
> > 
> > v2:
> > - Another GEN4 Lenovo laptop had the same issue, while platforms from
> >   other vendors (including mobile and desktop, GEN4 and non-GEN4) seem
> >   to work fine. Based on this apply the workaround on all GEN4 Lenovo
> >   platforms.
> > - add code comment about failing platforms (Ville)
> 
> The outdated ThinkPad X41 that I torture by running rc's showed
> identical symptoms, also since v3.19-rc1. It uses a gen3 chipset (it has
> a 915GM, I think, but I keep forgetting details like that).
> 
> I did everything wrong to get this fixed (1: hope this gets magically
> fixed; 2: bisect it myself, thinking every now and then that I know
> better than git bisect which commit to choose; 3: finally grep lkml). So
> here I am late to the show.
> 
> > Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
> > Reported-and-bisected-by: Bjørn Mork <bjorn@mork.no>
> > Signed-off-by: Imre Deak <imre.deak@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c | 30 +++++++++++++++++++++++++-----
> >  1 file changed, 25 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 4badb23..ff3662f 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev)
> >  	return 0;
> >  }
> >  
> > -static int i915_drm_suspend_late(struct drm_device *drm_dev)
> > +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
> >  {
> >  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
> >  	int ret;
> > @@ -651,7 +651,17 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
> >  	}
> >  
> >  	pci_disable_device(drm_dev->pdev);
> > -	pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> > +	/*
> > +	 * During hibernation on some GEN4 platforms the BIOS may try to access
> > +	 * the device even though it's already in D3 and hang the machine. So
> > +	 * leave the device in D0 on those platforms and hope the BIOS will
> > +	 * power down the device properly. Platforms where this was seen:
> > +	 * Lenovo Thinkpad X301, X61s
> > +	 */
> > +	if (!(hibernation &&
> > +	      drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
> > +	      INTEL_INFO(dev_priv)->gen == 4))
> > +		pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> >  
> >  	return 0;
> >  }
> 
> I'll paste a DRAFT patch that fixes this for that X41 at the end of the
> message. The patch is rather ugly. Should we perhaps try a quirk table
> or something like that?
> 
> 
> Paul Bolle
> 
> -------->8--------
> Subject: [PATCH] drm/i915: work around hang during hibernation on gen3 too
> 
> Commit ab3be73fa7b4 ("drm/i915: gen4: work around hang during
> hibernation") was targetted at gen4 platforms shipped by Lenovo. The
> same problem can also be seen on a Lenovo ThinkPad X41. Expand the test
> to catch that system too.
> 
> Sadly, this system still uses IBM's subsystem vendor id. So we end up
> with a rather unpleasant test. Use the IS_GEN3() and IS_GEN4() macros to
> lessen the pain a bit.

We had another bug report which showed similar problems on something
as recent as SNB:
https://bugzilla.kernel.org/show_bug.cgi?id=94241
So I guess we really want to make the check 'gen < 7'.

My IVB X1 Carbon doesn't need this quirk, so hopefully that indicates
the Lenovo BIOSen became more sane for gen7+.

> 
> Not-yet-signed-off-by: Paul Bolle <pebolle@tiscali.nl>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index cc6ea53d2b81..3a07164f5860 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -641,11 +641,12 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
>  	 * the device even though it's already in D3 and hang the machine. So
>  	 * leave the device in D0 on those platforms and hope the BIOS will
>  	 * power down the device properly. Platforms where this was seen:
> -	 * Lenovo Thinkpad X301, X61s
> +	 * Lenovo Thinkpad X301, X61s, X41
>  	 */
>  	if (!(hibernation &&
> -	      drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
> -	      INTEL_INFO(dev_priv)->gen == 4))
> +	      (drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO ||
> +	       drm_dev->pdev->subsystem_vendor == PCI_SUBVENDOR_ID_IBM) &&
> +	      (IS_GEN3(dev_priv) || IS_GEN4(dev_priv))))
>  		pci_set_power_state(drm_dev->pdev, PCI_D3hot);
>  
>  	return 0;
> -- 
> 2.1.0
Paul Bolle March 18, 2015, 4:22 p.m. UTC | #2
On Wed, 2015-03-18 at 12:22 +0200, Ville Syrjälä wrote:
> We had another bug report which showed similar problems on something
> as recent as SNB:
> https://bugzilla.kernel.org/show_bug.cgi?id=94241
> So I guess we really want to make the check 'gen < 7'.
> 
> My IVB X1 Carbon doesn't need this quirk, so hopefully that indicates
> the Lenovo BIOSen became more sane for gen7+.

On the other hand my ThinkPad X220 has vendor:device ids 8086:0126,
which makes it a gen6 device (assuming I parsed the various preprocessor
defines in include/drm/i915_pciids.h and drivers/gpu/drm/i915/i915_drv.c
correctly). That laptop is now running v3.19.1 and never hit this issue.


Paul Bolle
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index cc6ea53d2b81..3a07164f5860 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -641,11 +641,12 @@  static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
 	 * the device even though it's already in D3 and hang the machine. So
 	 * leave the device in D0 on those platforms and hope the BIOS will
 	 * power down the device properly. Platforms where this was seen:
-	 * Lenovo Thinkpad X301, X61s
+	 * Lenovo Thinkpad X301, X61s, X41
 	 */
 	if (!(hibernation &&
-	      drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO &&
-	      INTEL_INFO(dev_priv)->gen == 4))
+	      (drm_dev->pdev->subsystem_vendor == PCI_VENDOR_ID_LENOVO ||
+	       drm_dev->pdev->subsystem_vendor == PCI_SUBVENDOR_ID_IBM) &&
+	      (IS_GEN3(dev_priv) || IS_GEN4(dev_priv))))
 		pci_set_power_state(drm_dev->pdev, PCI_D3hot);
 
 	return 0;