diff mbox

drm/i915/hsw: Fix GPU hang during resume from S3-devices state

Message ID 1476283597-580-1-git-send-email-imre.deak@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Imre Deak Oct. 12, 2016, 2:46 p.m. UTC
Currently resuming on HSW from S3 pm_test/devices state leads to an
unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For
a full S3 cycle this change only means the reset happens earlier (before
reaching S3). For S4 the reset will happen now both during the freeze
and quiesce phases, which is a benefit since it will guarantee that the
GPU is idle before creating and loading the hibernation image.

Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Imre Deak <imre.deak@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 28 ++--------------------------
 drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++
 2 files changed, 26 insertions(+), 26 deletions(-)

Comments

Chris Wilson Oct. 12, 2016, 3:44 p.m. UTC | #1
On Wed, Oct 12, 2016 at 05:46:37PM +0300, Imre Deak wrote:
> Currently resuming on HSW from S3 pm_test/devices state leads to an
> unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For
> a full S3 cycle this change only means the reset happens earlier (before
> reaching S3). For S4 the reset will happen now both during the freeze
> and quiesce phases, which is a benefit since it will guarantee that the
> GPU is idle before creating and loading the hibernation image.
> 
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Imre Deak <imre.deak@intel.com>

Makes sense, we should treat the transition to suspend just like unload.
We should do the symmetric reset on load/resume as well.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris
Saarinen, Jani Oct. 12, 2016, 6:32 p.m. UTC | #2
> == Series Details ==

> 

> Series: drm/i915/hsw: Fix GPU hang during resume from S3-devices state

> URL   : https://patchwork.freedesktop.org/series/13654/

> State : warning

> 

> == Summary ==

> 

> Series 13654v1 drm/i915/hsw: Fix GPU hang during resume from S3-devices

> state

> https://patchwork.freedesktop.org/api/1.0/series/13654/revisions/1/mbox/

> 

> Test drv_module_reload_basic:

>                 skip       -> PASS       (fi-skl-6770hq)

> Test kms_cursor_legacy:

>         Subgroup basic-flip-after-cursor-legacy:

>                 pass       -> DMESG-WARN (fi-ivb-3770)


 [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144
 [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144

> Test kms_flip:

>         Subgroup basic-flip-vs-modeset:

>                 dmesg-warn -> PASS       (fi-skl-6770hq)

> Test kms_pipe_crc_basic:

>         Subgroup suspend-read-crc-pipe-b:

>                 dmesg-warn -> PASS       (fi-byt-j1900)

> Test kms_psr_sink_crc:

>         Subgroup psr_basic:

>                 dmesg-warn -> PASS       (fi-skl-6700hq)

> Test vgem_basic:

>         Subgroup unload:

>                 pass       -> SKIP       (fi-bdw-5557u)

>                 skip       -> PASS       (fi-hsw-4770)

> 

> fi-bdw-5557u     total:248  pass:231  dwarn:0   dfail:0   fail:0   skip:17

> fi-bsw-n3050     total:248  pass:205  dwarn:0   dfail:0   fail:0   skip:43

> fi-bxt-t5700     total:248  pass:217  dwarn:0   dfail:0   fail:0   skip:31

> fi-byt-j1900     total:248  pass:214  dwarn:1   dfail:0   fail:1   skip:32

> fi-byt-n2820     total:248  pass:211  dwarn:0   dfail:0   fail:1   skip:36

> fi-hsw-4770      total:248  pass:225  dwarn:0   dfail:0   fail:0   skip:23

> fi-hsw-4770r     total:248  pass:225  dwarn:0   dfail:0   fail:0   skip:23

> fi-ilk-650       total:248  pass:185  dwarn:0   dfail:0   fail:2   skip:61

> fi-ivb-3520m     total:248  pass:222  dwarn:0   dfail:0   fail:0   skip:26

> fi-ivb-3770      total:248  pass:221  dwarn:1   dfail:0   fail:0   skip:26

> fi-kbl-7200u     total:248  pass:222  dwarn:0   dfail:0   fail:0   skip:26

> fi-skl-6260u     total:248  pass:233  dwarn:0   dfail:0   fail:0   skip:15

> fi-skl-6700hq    total:248  pass:225  dwarn:0   dfail:0   fail:0   skip:23

> fi-skl-6700k     total:248  pass:222  dwarn:1   dfail:0   fail:0   skip:25

> fi-skl-6770hq    total:248  pass:231  dwarn:1   dfail:0   fail:1   skip:15

> fi-snb-2520m     total:248  pass:211  dwarn:0   dfail:0   fail:0   skip:37

> fi-snb-2600      total:248  pass:210  dwarn:0   dfail:0   fail:0   skip:38

> 

> Results at /archive/results/CI_IGT_test/Patchwork_2690/

> 

> 14740bb25ec36fe4ce8042af3eb48aeb45e5bc13 drm-intel-nightly: 2016y-10m-

> 12d-16h-18m-24s UTC integration manifest

> 78a5d29 drm/i915/hsw: Fix GPU hang during resume from S3-devices state

> 

> _______________________________________________

> Intel-gfx mailing list

> Intel-gfx@lists.freedesktop.org

> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Jani Saarinen
Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
Imre Deak Oct. 13, 2016, 9:18 a.m. UTC | #3
On ke, 2016-10-12 at 21:32 +0300, Saarinen, Jani wrote:
> > == Series Details ==
> > 
> > Series: drm/i915/hsw: Fix GPU hang during resume from S3-devices state
> > URL   : https://patchwork.freedesktop.org/series/13654/
> > State : warning
> > 
> > == Summary ==
> > 
> > Series 13654v1 drm/i915/hsw: Fix GPU hang during resume from S3-devices
> > state
> > https://patchwork.freedesktop.org/api/1.0/series/13654/revisions/1/mbox/
> > 
> > Test drv_module_reload_basic:
> >                 skip       -> PASS       (fi-skl-6770hq)
> > Test kms_cursor_legacy:
> >         Subgroup basic-flip-after-cursor-legacy:
> >                 pass       -> DMESG-WARN (fi-ivb-3770)
> 
>  [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144
>  [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144

A pre-existing bug, it seems it started in CI_DRM_1701, not long after attaching
the HDMI monitor the EDID belongs to. I opened a new bug:
https://bugs.freedesktop.org/show_bug.cgi?id=98228

Thanks for the review, I pushed the patch to -dinq.

> 
> > Test kms_flip:
> >         Subgroup basic-flip-vs-modeset:
> >                 dmesg-warn -> PASS       (fi-skl-6770hq)
> > Test kms_pipe_crc_basic:
> >         Subgroup suspend-read-crc-pipe-b:
> >                 dmesg-warn -> PASS       (fi-byt-j1900)
> > Test kms_psr_sink_crc:
> >         Subgroup psr_basic:
> >                 dmesg-warn -> PASS       (fi-skl-6700hq)
> > Test vgem_basic:
> >         Subgroup unload:
> >                 pass       -> SKIP       (fi-bdw-5557u)
> >                 skip       -> PASS       (fi-hsw-4770)
> > 
> > fi-bdw-5557u     total:248  pass:231  dwarn:0   dfail:0   fail:0   skip:17
> > fi-bsw-n3050     total:248  pass:205  dwarn:0   dfail:0   fail:0   skip:43
> > fi-bxt-t5700     total:248  pass:217  dwarn:0   dfail:0   fail:0   skip:31
> > fi-byt-j1900     total:248  pass:214  dwarn:1   dfail:0   fail:1   skip:32
> > fi-byt-n2820     total:248  pass:211  dwarn:0   dfail:0   fail:1   skip:36
> > fi-hsw-4770      total:248  pass:225  dwarn:0   dfail:0   fail:0   skip:23
> > fi-hsw-4770r     total:248  pass:225  dwarn:0   dfail:0   fail:0   skip:23
> > fi-ilk-650       total:248  pass:185  dwarn:0   dfail:0   fail:2   skip:61
> > fi-ivb-3520m     total:248  pass:222  dwarn:0   dfail:0   fail:0   skip:26
> > fi-ivb-3770      total:248  pass:221  dwarn:1   dfail:0   fail:0   skip:26
> > fi-kbl-7200u     total:248  pass:222  dwarn:0   dfail:0   fail:0   skip:26
> > fi-skl-6260u     total:248  pass:233  dwarn:0   dfail:0   fail:0   skip:15
> > fi-skl-6700hq    total:248  pass:225  dwarn:0   dfail:0   fail:0   skip:23
> > fi-skl-6700k     total:248  pass:222  dwarn:1   dfail:0   fail:0   skip:25
> > fi-skl-6770hq    total:248  pass:231  dwarn:1   dfail:0   fail:1   skip:15
> > fi-snb-2520m     total:248  pass:211  dwarn:0   dfail:0   fail:0   skip:37
> > fi-snb-2600      total:248  pass:210  dwarn:0   dfail:0   fail:0   skip:38
> > 
> > Results at /archive/results/CI_IGT_test/Patchwork_2690/
> > 
> > 14740bb25ec36fe4ce8042af3eb48aeb45e5bc13 drm-intel-nightly: 2016y-10m-
> > 12d-16h-18m-24s UTC integration manifest
> > 78a5d29 drm/i915/hsw: Fix GPU hang during resume from S3-devices state
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> Jani Saarinen
> Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 89d3222..e9b3bfc 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -532,32 +532,6 @@  static const struct vga_switcheroo_client_ops i915_switcheroo_ops = {
 
 static void i915_gem_fini(struct drm_device *dev)
 {
-	struct drm_i915_private *dev_priv = to_i915(dev);
-
-	/*
-	 * Neither the BIOS, ourselves or any other kernel
-	 * expects the system to be in execlists mode on startup,
-	 * so we need to reset the GPU back to legacy mode. And the only
-	 * known way to disable logical contexts is through a GPU reset.
-	 *
-	 * So in order to leave the system in a known default configuration,
-	 * always reset the GPU upon unload. Afterwards we then clean up the
-	 * GEM state tracking, flushing off the requests and leaving the
-	 * system in a known idle state.
-	 *
-	 * Note that is of the upmost importance that the GPU is idle and
-	 * all stray writes are flushed *before* we dismantle the backing
-	 * storage for the pinned objects.
-	 *
-	 * However, since we are uncertain that reseting the GPU on older
-	 * machines is a good idea, we don't - just in case it leaves the
-	 * machine in an unusable condition.
-	 */
-	if (HAS_HW_CONTEXTS(dev)) {
-		int reset = intel_gpu_reset(dev_priv, ALL_ENGINES);
-		WARN_ON(reset && reset != -ENODEV);
-	}
-
 	mutex_lock(&dev->struct_mutex);
 	i915_gem_cleanup_engines(dev);
 	i915_gem_context_fini(dev);
@@ -636,6 +610,8 @@  static int i915_load_modeset_init(struct drm_device *dev)
 	return 0;
 
 cleanup_gem:
+	if (i915_gem_suspend(dev))
+		DRM_ERROR("failed to idle hardware; continuing to unload!\n");
 	i915_gem_fini(dev);
 cleanup_irq:
 	intel_guc_fini(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fdd496e..a86bc8f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4273,6 +4273,30 @@  int i915_gem_suspend(struct drm_device *dev)
 	 */
 	WARN_ON(dev_priv->gt.awake);
 
+	/*
+	 * Neither the BIOS, ourselves or any other kernel
+	 * expects the system to be in execlists mode on startup,
+	 * so we need to reset the GPU back to legacy mode. And the only
+	 * known way to disable logical contexts is through a GPU reset.
+	 *
+	 * So in order to leave the system in a known default configuration,
+	 * always reset the GPU upon unload and suspend. Afterwards we then
+	 * clean up the GEM state tracking, flushing off the requests and
+	 * leaving the system in a known idle state.
+	 *
+	 * Note that is of the upmost importance that the GPU is idle and
+	 * all stray writes are flushed *before* we dismantle the backing
+	 * storage for the pinned objects.
+	 *
+	 * However, since we are uncertain that reseting the GPU on older
+	 * machines is a good idea, we don't - just in case it leaves the
+	 * machine in an unusable condition.
+	 */
+	if (HAS_HW_CONTEXTS(dev)) {
+		int reset = intel_gpu_reset(dev_priv, ALL_ENGINES);
+		WARN_ON(reset && reset != -ENODEV);
+	}
+
 	return 0;
 
 err: