Message ID | 20180705150214.28316-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote: > If the GPU is irrecoverably wedged on startup, it means that it failed > on initialisation and we have already tried to reset it but failed. We > can ignore all further testing, as it is already dead. Failing early, > prevents us from slowly failing in our endeavours later and timing out. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > index fe7d3190ebfe..fca073c96c2d 100644 > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915) > if (!intel_has_gpu_reset(i915)) > return 0; > > + if (i915_terminally_wedged(&i915->gpu_error)) > + return -EIO; /* we're long past hope of a successful reset */ > + Maybe -ENOTRECOVERABLE ? Anyways Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> > intel_runtime_pm_get(i915); > saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck); > > -- > 2.18.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Rodrigo Vivi (2018-07-05 21:44:56) > On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote: > > If the GPU is irrecoverably wedged on startup, it means that it failed > > on initialisation and we have already tried to reset it but failed. We > > can ignore all further testing, as it is already dead. Failing early, > > prevents us from slowly failing in our endeavours later and timing out. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > > index fe7d3190ebfe..fca073c96c2d 100644 > > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c > > @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915) > > if (!intel_has_gpu_reset(i915)) > > return 0; > > > > + if (i915_terminally_wedged(&i915->gpu_error)) > > + return -EIO; /* we're long past hope of a successful reset */ > > + > > Maybe -ENOTRECOVERABLE ? Interesting choice, our convention so far has been -EIO for losing state due to a GPU hang, but an extra flavour for when we wedge the driver? Hmm, fence->error needs to remain -EIO (differentiating that between reset/wedge for userspace seems to convey no more information imo), and we've already baked if (i915_terminally_wedged(&i915->gpu_error)) return -EIO; into the abi for the points of interest. Sadly too late, I don't think we can pick another errno for the cases it actually matter. -Chris
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c index fe7d3190ebfe..fca073c96c2d 100644 --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915) if (!intel_has_gpu_reset(i915)) return 0; + if (i915_terminally_wedged(&i915->gpu_error)) + return -EIO; /* we're long past hope of a successful reset */ + intel_runtime_pm_get(i915); saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
If the GPU is irrecoverably wedged on startup, it means that it failed on initialisation and we have already tried to reset it but failed. We can ignore all further testing, as it is already dead. Failing early, prevents us from slowly failing in our endeavours later and timing out. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++ 1 file changed, 3 insertions(+)