diff mbox

drm/i915/selftests: Fail hangcheck testing if the GPU is wedged

Message ID 20180705150214.28316-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 5, 2018, 3:02 p.m. UTC
If the GPU is irrecoverably wedged on startup, it means that it failed
on initialisation and we have already tried to reset it but failed. We
can ignore all further testing, as it is already dead. Failing early,
prevents us from slowly failing in our endeavours later and timing out.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Rodrigo Vivi July 5, 2018, 8:44 p.m. UTC | #1
On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote:
> If the GPU is irrecoverably wedged on startup, it means that it failed
> on initialisation and we have already tried to reset it but failed. We
> can ignore all further testing, as it is already dead. Failing early,
> prevents us from slowly failing in our endeavours later and timing out.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index fe7d3190ebfe..fca073c96c2d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
>  	if (!intel_has_gpu_reset(i915))
>  		return 0;
>  
> +	if (i915_terminally_wedged(&i915->gpu_error))
> +		return -EIO; /* we're long past hope of a successful reset */
> +

Maybe -ENOTRECOVERABLE ?

Anyways

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


>  	intel_runtime_pm_get(i915);
>  	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);
>  
> -- 
> 2.18.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson July 6, 2018, 6:37 a.m. UTC | #2
Quoting Rodrigo Vivi (2018-07-05 21:44:56)
> On Thu, Jul 05, 2018 at 04:02:14PM +0100, Chris Wilson wrote:
> > If the GPU is irrecoverably wedged on startup, it means that it failed
> > on initialisation and we have already tried to reset it but failed. We
> > can ignore all further testing, as it is already dead. Failing early,
> > prevents us from slowly failing in our endeavours later and timing out.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > index fe7d3190ebfe..fca073c96c2d 100644
> > --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> > @@ -1243,6 +1243,9 @@ int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
> >       if (!intel_has_gpu_reset(i915))
> >               return 0;
> >  
> > +     if (i915_terminally_wedged(&i915->gpu_error))
> > +             return -EIO; /* we're long past hope of a successful reset */
> > +
> 
> Maybe -ENOTRECOVERABLE ?

Interesting choice, our convention so far has been -EIO for losing state
due to a GPU hang, but an extra flavour for when we wedge the driver?

Hmm, fence->error needs to remain -EIO (differentiating that between
reset/wedge for userspace seems to convey no more information imo), and
we've already baked 
	if (i915_terminally_wedged(&i915->gpu_error))
		return -EIO;
into the abi for the points of interest. 

Sadly too late, I don't think we can pick another errno for the cases it
actually matter.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index fe7d3190ebfe..fca073c96c2d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1243,6 +1243,9 @@  int intel_hangcheck_live_selftests(struct drm_i915_private *i915)
 	if (!intel_has_gpu_reset(i915))
 		return 0;
 
+	if (i915_terminally_wedged(&i915->gpu_error))
+		return -EIO; /* we're long past hope of a successful reset */
+
 	intel_runtime_pm_get(i915);
 	saved_hangcheck = fetch_and_zero(&i915_modparams.enable_hangcheck);