Message ID | 20190625230815.32244-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Fail harder if GPU reset fails outright | expand |
Quoting Chris Wilson (2019-06-26 00:08:15) > If we request a reset and the GPU fails to respond, abandon all hope. If > the request is still stuck when we attempt to do another, fail early and > avoid requesting multiple possibly conflicting domains be reset > simultaneously. > > We should never see this in practice, and if we do, it is already too > late. > > References: https://bugs.freedesktop.org/show_bug.cgi?id=110998 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++++++++++++--- > 1 file changed, 12 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c > index 72002c0f9698..56c43f8cbc17 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -301,8 +301,16 @@ static int gen6_hw_domain_reset(struct drm_i915_private *i915, > u32 hw_domain_mask) > { > struct intel_uncore *uncore = &i915->uncore; > + u32 status; > int err; > > + /* > + * Check that all previous reset requests have been flushed so > + * that we don't simultaneously try to reset 2 overlapping domains. > + */ > + if (intel_uncore_read_fw(uncore, GEN6_GDRST)) Thinking about this, this does nerf our attempt to try and reset two engines at once from different events. Put it on the back burner. -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 72002c0f9698..56c43f8cbc17 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -301,8 +301,16 @@ static int gen6_hw_domain_reset(struct drm_i915_private *i915, u32 hw_domain_mask) { struct intel_uncore *uncore = &i915->uncore; + u32 status; int err; + /* + * Check that all previous reset requests have been flushed so + * that we don't simultaneously try to reset 2 overlapping domains. + */ + if (intel_uncore_read_fw(uncore, GEN6_GDRST)) + return -EIO; + /* * GEN6_GDRST is not in the gt power well, no need to check * for fifo space for the write or forcewake the chip for @@ -314,10 +322,11 @@ static int gen6_hw_domain_reset(struct drm_i915_private *i915, err = __intel_wait_for_register_fw(uncore, GEN6_GDRST, hw_domain_mask, 0, 500, 0, - NULL); + &status); + intel_uncore_write_fw(uncore, GEN6_GDRST, 0); if (err) - DRM_DEBUG_DRIVER("Wait for 0x%08x engines reset failed\n", - hw_domain_mask); + DRM_DEBUG_DRIVER("Wait for 0x%08x [HW] engines reset failed: %08x\n", + hw_domain_mask, status); return err; }
If we request a reset and the GPU fails to respond, abandon all hope. If the request is still stuck when we attempt to do another, fail early and avoid requesting multiple possibly conflicting domains be reset simultaneously. We should never see this in practice, and if we do, it is already too late. References: https://bugs.freedesktop.org/show_bug.cgi?id=110998 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> --- drivers/gpu/drm/i915/gt/intel_reset.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-)