Message ID | 1476283597-580-1-git-send-email-imre.deak@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Oct 12, 2016 at 05:46:37PM +0300, Imre Deak wrote: > Currently resuming on HSW from S3 pm_test/devices state leads to an > unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For > a full S3 cycle this change only means the reset happens earlier (before > reaching S3). For S4 the reset will happen now both during the freeze > and quiesce phases, which is a benefit since it will guarantee that the > GPU is idle before creating and loading the hibernation image. > > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Imre Deak <imre.deak@intel.com> Makes sense, we should treat the transition to suspend just like unload. We should do the symmetric reset on load/resume as well. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> -Chris
> == Series Details == > > Series: drm/i915/hsw: Fix GPU hang during resume from S3-devices state > URL : https://patchwork.freedesktop.org/series/13654/ > State : warning > > == Summary == > > Series 13654v1 drm/i915/hsw: Fix GPU hang during resume from S3-devices > state > https://patchwork.freedesktop.org/api/1.0/series/13654/revisions/1/mbox/ > > Test drv_module_reload_basic: > skip -> PASS (fi-skl-6770hq) > Test kms_cursor_legacy: > Subgroup basic-flip-after-cursor-legacy: > pass -> DMESG-WARN (fi-ivb-3770) [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144 [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144 > Test kms_flip: > Subgroup basic-flip-vs-modeset: > dmesg-warn -> PASS (fi-skl-6770hq) > Test kms_pipe_crc_basic: > Subgroup suspend-read-crc-pipe-b: > dmesg-warn -> PASS (fi-byt-j1900) > Test kms_psr_sink_crc: > Subgroup psr_basic: > dmesg-warn -> PASS (fi-skl-6700hq) > Test vgem_basic: > Subgroup unload: > pass -> SKIP (fi-bdw-5557u) > skip -> PASS (fi-hsw-4770) > > fi-bdw-5557u total:248 pass:231 dwarn:0 dfail:0 fail:0 skip:17 > fi-bsw-n3050 total:248 pass:205 dwarn:0 dfail:0 fail:0 skip:43 > fi-bxt-t5700 total:248 pass:217 dwarn:0 dfail:0 fail:0 skip:31 > fi-byt-j1900 total:248 pass:214 dwarn:1 dfail:0 fail:1 skip:32 > fi-byt-n2820 total:248 pass:211 dwarn:0 dfail:0 fail:1 skip:36 > fi-hsw-4770 total:248 pass:225 dwarn:0 dfail:0 fail:0 skip:23 > fi-hsw-4770r total:248 pass:225 dwarn:0 dfail:0 fail:0 skip:23 > fi-ilk-650 total:248 pass:185 dwarn:0 dfail:0 fail:2 skip:61 > fi-ivb-3520m total:248 pass:222 dwarn:0 dfail:0 fail:0 skip:26 > fi-ivb-3770 total:248 pass:221 dwarn:1 dfail:0 fail:0 skip:26 > fi-kbl-7200u total:248 pass:222 dwarn:0 dfail:0 fail:0 skip:26 > fi-skl-6260u total:248 pass:233 dwarn:0 dfail:0 fail:0 skip:15 > fi-skl-6700hq total:248 pass:225 dwarn:0 dfail:0 fail:0 skip:23 > fi-skl-6700k total:248 pass:222 dwarn:1 dfail:0 fail:0 skip:25 > fi-skl-6770hq total:248 pass:231 dwarn:1 dfail:0 fail:1 skip:15 > fi-snb-2520m total:248 pass:211 dwarn:0 dfail:0 fail:0 skip:37 > fi-snb-2600 total:248 pass:210 dwarn:0 dfail:0 fail:0 skip:38 > > Results at /archive/results/CI_IGT_test/Patchwork_2690/ > > 14740bb25ec36fe4ce8042af3eb48aeb45e5bc13 drm-intel-nightly: 2016y-10m- > 12d-16h-18m-24s UTC integration manifest > 78a5d29 drm/i915/hsw: Fix GPU hang during resume from S3-devices state > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx Jani Saarinen Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
On ke, 2016-10-12 at 21:32 +0300, Saarinen, Jani wrote: > > == Series Details == > > > > Series: drm/i915/hsw: Fix GPU hang during resume from S3-devices state > > URL : https://patchwork.freedesktop.org/series/13654/ > > State : warning > > > > == Summary == > > > > Series 13654v1 drm/i915/hsw: Fix GPU hang during resume from S3-devices > > state > > https://patchwork.freedesktop.org/api/1.0/series/13654/revisions/1/mbox/ > > > > Test drv_module_reload_basic: > > skip -> PASS (fi-skl-6770hq) > > Test kms_cursor_legacy: > > Subgroup basic-flip-after-cursor-legacy: > > pass -> DMESG-WARN (fi-ivb-3770) > > [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144 > [drm:drm_edid_block_valid] *ERROR* EDID checksum is invalid, remainder is 144 A pre-existing bug, it seems it started in CI_DRM_1701, not long after attaching the HDMI monitor the EDID belongs to. I opened a new bug: https://bugs.freedesktop.org/show_bug.cgi?id=98228 Thanks for the review, I pushed the patch to -dinq. > > > Test kms_flip: > > Subgroup basic-flip-vs-modeset: > > dmesg-warn -> PASS (fi-skl-6770hq) > > Test kms_pipe_crc_basic: > > Subgroup suspend-read-crc-pipe-b: > > dmesg-warn -> PASS (fi-byt-j1900) > > Test kms_psr_sink_crc: > > Subgroup psr_basic: > > dmesg-warn -> PASS (fi-skl-6700hq) > > Test vgem_basic: > > Subgroup unload: > > pass -> SKIP (fi-bdw-5557u) > > skip -> PASS (fi-hsw-4770) > > > > fi-bdw-5557u total:248 pass:231 dwarn:0 dfail:0 fail:0 skip:17 > > fi-bsw-n3050 total:248 pass:205 dwarn:0 dfail:0 fail:0 skip:43 > > fi-bxt-t5700 total:248 pass:217 dwarn:0 dfail:0 fail:0 skip:31 > > fi-byt-j1900 total:248 pass:214 dwarn:1 dfail:0 fail:1 skip:32 > > fi-byt-n2820 total:248 pass:211 dwarn:0 dfail:0 fail:1 skip:36 > > fi-hsw-4770 total:248 pass:225 dwarn:0 dfail:0 fail:0 skip:23 > > fi-hsw-4770r total:248 pass:225 dwarn:0 dfail:0 fail:0 skip:23 > > fi-ilk-650 total:248 pass:185 dwarn:0 dfail:0 fail:2 skip:61 > > fi-ivb-3520m total:248 pass:222 dwarn:0 dfail:0 fail:0 skip:26 > > fi-ivb-3770 total:248 pass:221 dwarn:1 dfail:0 fail:0 skip:26 > > fi-kbl-7200u total:248 pass:222 dwarn:0 dfail:0 fail:0 skip:26 > > fi-skl-6260u total:248 pass:233 dwarn:0 dfail:0 fail:0 skip:15 > > fi-skl-6700hq total:248 pass:225 dwarn:0 dfail:0 fail:0 skip:23 > > fi-skl-6700k total:248 pass:222 dwarn:1 dfail:0 fail:0 skip:25 > > fi-skl-6770hq total:248 pass:231 dwarn:1 dfail:0 fail:1 skip:15 > > fi-snb-2520m total:248 pass:211 dwarn:0 dfail:0 fail:0 skip:37 > > fi-snb-2600 total:248 pass:210 dwarn:0 dfail:0 fail:0 skip:38 > > > > Results at /archive/results/CI_IGT_test/Patchwork_2690/ > > > > 14740bb25ec36fe4ce8042af3eb48aeb45e5bc13 drm-intel-nightly: 2016y-10m- > > 12d-16h-18m-24s UTC integration manifest > > 78a5d29 drm/i915/hsw: Fix GPU hang during resume from S3-devices state > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > > Jani Saarinen > Intel Finland Oy - BIC 0357606-4 - Westendinkatu 7, 02160 Espoo
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 89d3222..e9b3bfc 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -532,32 +532,6 @@ static const struct vga_switcheroo_client_ops i915_switcheroo_ops = { static void i915_gem_fini(struct drm_device *dev) { - struct drm_i915_private *dev_priv = to_i915(dev); - - /* - * Neither the BIOS, ourselves or any other kernel - * expects the system to be in execlists mode on startup, - * so we need to reset the GPU back to legacy mode. And the only - * known way to disable logical contexts is through a GPU reset. - * - * So in order to leave the system in a known default configuration, - * always reset the GPU upon unload. Afterwards we then clean up the - * GEM state tracking, flushing off the requests and leaving the - * system in a known idle state. - * - * Note that is of the upmost importance that the GPU is idle and - * all stray writes are flushed *before* we dismantle the backing - * storage for the pinned objects. - * - * However, since we are uncertain that reseting the GPU on older - * machines is a good idea, we don't - just in case it leaves the - * machine in an unusable condition. - */ - if (HAS_HW_CONTEXTS(dev)) { - int reset = intel_gpu_reset(dev_priv, ALL_ENGINES); - WARN_ON(reset && reset != -ENODEV); - } - mutex_lock(&dev->struct_mutex); i915_gem_cleanup_engines(dev); i915_gem_context_fini(dev); @@ -636,6 +610,8 @@ static int i915_load_modeset_init(struct drm_device *dev) return 0; cleanup_gem: + if (i915_gem_suspend(dev)) + DRM_ERROR("failed to idle hardware; continuing to unload!\n"); i915_gem_fini(dev); cleanup_irq: intel_guc_fini(dev); diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index fdd496e..a86bc8f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -4273,6 +4273,30 @@ int i915_gem_suspend(struct drm_device *dev) */ WARN_ON(dev_priv->gt.awake); + /* + * Neither the BIOS, ourselves or any other kernel + * expects the system to be in execlists mode on startup, + * so we need to reset the GPU back to legacy mode. And the only + * known way to disable logical contexts is through a GPU reset. + * + * So in order to leave the system in a known default configuration, + * always reset the GPU upon unload and suspend. Afterwards we then + * clean up the GEM state tracking, flushing off the requests and + * leaving the system in a known idle state. + * + * Note that is of the upmost importance that the GPU is idle and + * all stray writes are flushed *before* we dismantle the backing + * storage for the pinned objects. + * + * However, since we are uncertain that reseting the GPU on older + * machines is a good idea, we don't - just in case it leaves the + * machine in an unusable condition. + */ + if (HAS_HW_CONTEXTS(dev)) { + int reset = intel_gpu_reset(dev_priv, ALL_ENGINES); + WARN_ON(reset && reset != -ENODEV); + } + return 0; err:
Currently resuming on HSW from S3 pm_test/devices state leads to an unrecoverable GPU hang. Resetting the GPU during suspend fixes this. For a full S3 cycle this change only means the reset happens earlier (before reaching S3). For S4 the reset will happen now both during the freeze and quiesce phases, which is a benefit since it will guarantee that the GPU is idle before creating and loading the hibernation image. Cc: Mika Kuoppala <mika.kuoppala@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 28 ++-------------------------- drivers/gpu/drm/i915/i915_gem.c | 24 ++++++++++++++++++++++++ 2 files changed, 26 insertions(+), 26 deletions(-)