diff mbox series

drm/i915: Fail harder if GPU reset fails outright

Message ID 20190625230815.32244-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915: Fail harder if GPU reset fails outright | expand

Commit Message

Chris Wilson June 25, 2019, 11:08 p.m. UTC
If we request a reset and the GPU fails to respond, abandon all hope. If
the request is still stuck when we attempt to do another, fail early and
avoid requesting multiple possibly conflicting domains be reset
simultaneously.

We should never see this in practice, and if we do, it is already too
late.

References: https://bugs.freedesktop.org/show_bug.cgi?id=110998
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

Comments

Chris Wilson June 26, 2019, 9 a.m. UTC | #1
Quoting Chris Wilson (2019-06-26 00:08:15)
> If we request a reset and the GPU fails to respond, abandon all hope. If
> the request is still stuck when we attempt to do another, fail early and
> avoid requesting multiple possibly conflicting domains be reset
> simultaneously.
> 
> We should never see this in practice, and if we do, it is already too
> late.
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=110998
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 72002c0f9698..56c43f8cbc17 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -301,8 +301,16 @@ static int gen6_hw_domain_reset(struct drm_i915_private *i915,
>                                 u32 hw_domain_mask)
>  {
>         struct intel_uncore *uncore = &i915->uncore;
> +       u32 status;
>         int err;
>  
> +       /*
> +        * Check that all previous reset requests have been flushed so
> +        * that we don't simultaneously try to reset 2 overlapping domains.
> +        */
> +       if (intel_uncore_read_fw(uncore, GEN6_GDRST))

Thinking about this, this does nerf our attempt to try and reset two
engines at once from different events.

Put it on the back burner.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 72002c0f9698..56c43f8cbc17 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -301,8 +301,16 @@  static int gen6_hw_domain_reset(struct drm_i915_private *i915,
 				u32 hw_domain_mask)
 {
 	struct intel_uncore *uncore = &i915->uncore;
+	u32 status;
 	int err;
 
+	/*
+	 * Check that all previous reset requests have been flushed so
+	 * that we don't simultaneously try to reset 2 overlapping domains.
+	 */
+	if (intel_uncore_read_fw(uncore, GEN6_GDRST))
+		return -EIO;
+
 	/*
 	 * GEN6_GDRST is not in the gt power well, no need to check
 	 * for fifo space for the write or forcewake the chip for
@@ -314,10 +322,11 @@  static int gen6_hw_domain_reset(struct drm_i915_private *i915,
 	err = __intel_wait_for_register_fw(uncore,
 					   GEN6_GDRST, hw_domain_mask, 0,
 					   500, 0,
-					   NULL);
+					   &status);
+	intel_uncore_write_fw(uncore, GEN6_GDRST, 0);
 	if (err)
-		DRM_DEBUG_DRIVER("Wait for 0x%08x engines reset failed\n",
-				 hw_domain_mask);
+		DRM_DEBUG_DRIVER("Wait for 0x%08x [HW] engines reset failed: %08x\n",
+				 hw_domain_mask, status);
 
 	return err;
 }