Message ID | 20211211065859.2248188-4-John.C.Harrison@Intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | More fixes/tweaks to GuC support | expand |
On 12/10/2021 10:58 PM, John.C.Harrison@Intel.com wrote: > From: John Harrison <John.C.Harrison@Intel.com> > > If GuC encounters an error during engine reset, the i915 driver > promotes to full GT reset. This includes an info message about why the > reset is happening. However, that is not treated as a failure by any > of the CI systems because resets are an expected occurrance during > testing. This kind of failure is a major problem and should never > happen. So, complain more loudly and make sure CI notices. > > Signed-off-by: John Harrison <John.C.Harrison@Intel.com> > --- > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > index 97311119da6f..6015815f1da0 100644 > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > @@ -4018,11 +4018,12 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > const u32 *msg, u32 len) > { > struct intel_engine_cs *engine; > + struct intel_gt *gt = guc_to_gt(guc); > u8 guc_class, instance; > u32 reason; > > if (unlikely(len != 3)) { > - drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); > + drm_err(>->i915->drm, "Invalid length %u", len); > return -EPROTO; > } > > @@ -4032,12 +4033,19 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, > > engine = guc_lookup_engine(guc, guc_class, instance); > if (unlikely(!engine)) { > - drm_err(&guc_to_gt(guc)->i915->drm, > + drm_err(>->i915->drm, > "Invalid engine %d:%d", guc_class, instance); > return -EPROTO; > } > > - intel_gt_handle_error(guc_to_gt(guc), engine->mask, > + /* > + * This is an unexpected failure of a hardware feature. So, log a real > + * error message not just the informational that comes with the reset. > + */ > + drm_err(>->i915->drm, "GuC engine reset request failed on %d:%d (%s) because %d", In the error handling called below, the reason is logged as 0x%08x, so IMO we should do the same here for consistency. With that: Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Daniele > + guc_class, instance, engine->name, reason); > + > + intel_gt_handle_error(gt, engine->mask, > I915_ERROR_CAPTURE, > "GuC failed to reset %s (reason=0x%08x)\n", > engine->name, reason);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 97311119da6f..6015815f1da0 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -4018,11 +4018,12 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, const u32 *msg, u32 len) { struct intel_engine_cs *engine; + struct intel_gt *gt = guc_to_gt(guc); u8 guc_class, instance; u32 reason; if (unlikely(len != 3)) { - drm_err(&guc_to_gt(guc)->i915->drm, "Invalid length %u", len); + drm_err(>->i915->drm, "Invalid length %u", len); return -EPROTO; } @@ -4032,12 +4033,19 @@ int intel_guc_engine_failure_process_msg(struct intel_guc *guc, engine = guc_lookup_engine(guc, guc_class, instance); if (unlikely(!engine)) { - drm_err(&guc_to_gt(guc)->i915->drm, + drm_err(>->i915->drm, "Invalid engine %d:%d", guc_class, instance); return -EPROTO; } - intel_gt_handle_error(guc_to_gt(guc), engine->mask, + /* + * This is an unexpected failure of a hardware feature. So, log a real + * error message not just the informational that comes with the reset. + */ + drm_err(>->i915->drm, "GuC engine reset request failed on %d:%d (%s) because %d", + guc_class, instance, engine->name, reason); + + intel_gt_handle_error(gt, engine->mask, I915_ERROR_CAPTURE, "GuC failed to reset %s (reason=0x%08x)\n", engine->name, reason);