Message ID | 20170522174641.25354-8-michel.thierry@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index fdfd8c66c956..a89738655460 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1747,8 +1747,12 @@ static int gen8_reset_engine_start(struct intel_engine_cs *engine) RESET_CTL_READY_TO_RESET, RESET_CTL_READY_TO_RESET, 700); - if (ret) - DRM_ERROR("%s: reset request timeout\n", engine->name); + if (GEM_WARN_ON(ret)) { + /* hw did not ack ready-to-reset, reset anyway */ + DRM_DEBUG_DRIVER("%s: reset request timeout, continue\n", + engine->name); + ret = 0; + } return ret; }
We try to get the engines ready/idle before triggering the reset, but it has been seen that sometimes the hw never acknowledges this. If we miss the acknowledgment, carry on with the reset instead of leaving the GPU in a wedged state. The frequency of missed acknowledgment from hw is low, but it has been seen at least once in CI. References: https://intel-gfx-ci.01.org/CI/Trybot_831/ Reported-by: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Michel Thierry <michel.thierry@intel.com> --- drivers/gpu/drm/i915/intel_uncore.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)