Patchwork drm/i915: Unwind i915_gem_init() failure

login
register
mail settings
Submitter Chris Wilson
Date Dec. 7, 2017, 11:56 p.m.
Message ID <20171207235625.9135-1-chris@chris-wilson.co.uk>
Download mbox | patch
Permalink /patch/10101263/
State New
Headers show

Comments

Chris Wilson - Dec. 7, 2017, 11:56 p.m.
Since Michal introduced new errors other than -EIO during
i915_gem_init(), we need to actually unwind on the error path as we have
to abort the module load (and we expect to do so cleanly!).

As we now teardown key state and then mark the driver as wedged (on
EIO), we have to be careful to not allow ourselves to resume and
unwedge, thus attempting to use the uninitialised driver.

References: 8620eb1dbbf2 ("drm/i915/uc: Don't use -EIO to report missing firmware")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Sagar Arun Kamble <sagar.a.kamble@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 55 ++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 12 deletions(-)
Chris Wilson - Dec. 8, 2017, 12:29 a.m.
Quoting Chris Wilson (2017-12-07 23:56:25)
> Since Michal introduced new errors other than -EIO during
> i915_gem_init(), we need to actually unwind on the error path as we have
> to abort the module load (and we expect to do so cleanly!).
> 
> As we now teardown key state and then mark the driver as wedged (on
> EIO), we have to be careful to not allow ourselves to resume and
> unwedge, thus attempting to use the uninitialised driver.

Hmm, I don't think this is sufficient just yet, e.g. execbuf should now
report -EINVAL for the absent engine as opposed to -EIO we expect.
Context allocation will still hit uninitialised idr etc.

Back to the plan of having if (ret != -EIO) at each point, I guess.
-Chris
Chris Wilson - Dec. 11, 2017, 1:17 p.m.
Quoting Patchwork (2017-12-11 13:04:37)
> == Series Details ==
> 
> Series: drm/i915: Unwind i915_gem_init() failure (rev3)
> URL   : https://patchwork.freedesktop.org/series/35060/
> State : success
> 
> == Summary ==
> 
> Series 35060v3 drm/i915: Unwind i915_gem_init() failure
> https://patchwork.freedesktop.org/api/1.0/series/35060/revisions/3/mbox/
> 
> Test debugfs_test:
>         Subgroup read_all_entries:
>                 pass       -> DMESG-FAIL (fi-elk-e7500) fdo#103989 +1
> Test gem_exec_fence:
>         Subgroup nb-await-default:
>                 dmesg-fail -> PASS       (fi-pnv-d510)
> Test gem_exec_reloc:
>         Subgroup basic-cpu-read-active:
>                 pass       -> FAIL       (fi-gdg-551) fdo#102582 +2
> Test gem_mmap_gtt:
>         Subgroup basic-small-bo-tiledx:
>                 fail       -> PASS       (fi-gdg-551) fdo#102575
> Test kms_cursor_legacy:
>         Subgroup basic-busy-flip-before-cursor-legacy:
>                 fail       -> PASS       (fi-gdg-551) fdo#102618
> Test kms_pipe_crc_basic:
>         Subgroup suspend-read-crc-pipe-a:
>                 pass       -> DMESG-WARN (fi-kbl-r) fdo#104172 +1
>         Subgroup suspend-read-crc-pipe-b:
>                 incomplete -> PASS       (fi-snb-2520m) fdo#103713

Drat, old version of igt. Will need to send again later.
-Chris
Chris Wilson - Dec. 13, 2017, 6:56 p.m.
Quoting Patchwork (2017-12-13 18:26:21)
> == Series Details ==
> 
> Series: drm/i915: Unwind i915_gem_init() failure (rev4)
> URL   : https://patchwork.freedesktop.org/series/35060/
> State : warning
> 
> == Summary ==
> 
> Test gem_tiled_swapping:
>         Subgroup non-threaded:
>                 pass       -> INCOMPLETE (shard-snb) fdo#104009
>                 pass       -> INCOMPLETE (shard-hsw) fdo#104218
> Test kms_frontbuffer_tracking:
>         Subgroup fbc-1p-offscren-pri-shrfb-draw-render:
>                 pass       -> FAIL       (shard-snb) fdo#101623
> Test drv_suspend:
>         Subgroup sysfs-reader:
>                 pass       -> SKIP       (shard-hsw)
> Test kms_cursor_crc:
>         Subgroup cursor-256x256-suspend:
>                 skip       -> PASS       (shard-snb) fdo#103375

With no better suggestions, and this fixes lots of WARN spam from
invalid modparams, I've pushed this patch. Thanks for the review,
and improvements are very much welcome.
-Chris

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 67dc11effc8e..a6a7ce861c37 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3245,7 +3245,12 @@  bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	if (!test_bit(I915_WEDGED, &i915->gpu_error.flags))
 		return true;
 
-	/* Before unwedging, make sure that all pending operations
+	/* Never successfully initialised, so can not unwedge? */
+	if (!i915->kernel_context)
+		return false;
+
+	/*
+	 * Before unwedging, make sure that all pending operations
 	 * are flushed and errored out - we may have requests waiting upon
 	 * third party fences. We marked all inflight requests as EIO, and
 	 * every execbuf since returned EIO, for consistency we want all
@@ -4863,7 +4868,8 @@  void i915_gem_resume(struct drm_i915_private *i915)
 	i915_gem_restore_gtt_mappings(i915);
 	i915_gem_restore_fences(i915);
 
-	/* As we didn't flush the kernel context before suspend, we cannot
+	/*
+	 * As we didn't flush the kernel context before suspend, we cannot
 	 * guarantee that the context image is complete. So let's just reset
 	 * it and start again.
 	 */
@@ -4884,8 +4890,10 @@  void i915_gem_resume(struct drm_i915_private *i915)
 	return;
 
 err_wedged:
-	DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
-	i915_gem_set_wedged(i915);
+	if (!i915_terminally_wedged(&i915->gpu_error)) {
+		DRM_ERROR("failed to re-initialize GPU, declaring wedged!\n");
+		i915_gem_set_wedged(i915);
+	}
 	goto out_unlock;
 }
 
@@ -5169,21 +5177,21 @@  int i915_gem_init(struct drm_i915_private *dev_priv)
 
 	ret = i915_gem_init_ggtt(dev_priv);
 	if (ret)
-		goto out_unlock;
+		goto err_unlock;
 
 	ret = i915_gem_contexts_init(dev_priv);
 	if (ret)
-		goto out_unlock;
+		goto err_ggtt;
 
 	ret = intel_engines_init(dev_priv);
 	if (ret)
-		goto out_unlock;
+		goto err_context;
 
 	intel_init_gt_powersave(dev_priv);
 
 	ret = i915_gem_init_hw(dev_priv);
 	if (ret)
-		goto out_unlock;
+		goto err_pm;
 
 	/*
 	 * Despite its name intel_init_clock_gating applies both display
@@ -5197,9 +5205,33 @@  int i915_gem_init(struct drm_i915_private *dev_priv)
 	intel_init_clock_gating(dev_priv);
 
 	ret = __intel_engines_record_defaults(dev_priv);
-out_unlock:
+	if (ret)
+		goto err_init_hw;
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+
+	return 0;
+
+err_init_hw:
+	i915_gem_wait_for_idle(dev_priv, I915_WAIT_LOCKED);
+	i915_gem_contexts_lost(dev_priv);
+	intel_uc_fini_hw(dev_priv);
+err_pm:
+	intel_cleanup_gt_powersave(dev_priv);
+	i915_gem_cleanup_engines(dev_priv);
+err_context:
+	i915_gem_contexts_fini(dev_priv);
+err_ggtt:
+err_unlock:
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	mutex_unlock(&dev_priv->drm.struct_mutex);
+
+	i915_gem_cleanup_userptr(dev_priv);
+
 	if (ret == -EIO) {
-		/* Allow engine initialisation to fail by marking the GPU as
+		/*
+		 * Allow engine initialisation to fail by marking the GPU as
 		 * wedged. But we only want to do this where the GPU is angry,
 		 * for all other failure, such as an allocation failure, bail.
 		 */
@@ -5209,9 +5241,8 @@  int i915_gem_init(struct drm_i915_private *dev_priv)
 		}
 		ret = 0;
 	}
-	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
 
+	i915_gem_drain_freed_objects(dev_priv);
 	return ret;
 }