Message ID | 95427A2E42F76A40B7520C17720417F092F04EAE@FMSMSX126.amr.corp.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting Singh, Satyeshwar (2018-05-31 22:17:25) > Hi Chris, > Isn't this dependent upon the workload submitted to the GuC? Meaning we have one workload that refused to be preempted (really long shader for example) but it went away on its own. Other workloads that come in later are preemptible. However, if we turn off preemption permanently, then all future workloads will not be preempted either which may not be desirable. Whoever implements the recovery mechanism can clear the flag. You may like to clear the flag on reset? We would have to be more careful about the manipulation of engine->flags as it's not serialised atm (since it's _meant_ to be write-once during init). -Chris
On Thu, May 31, 2018 at 10:20:54PM +0100, Chris Wilson wrote: > Quoting Singh, Satyeshwar (2018-05-31 22:17:25) > > Hi Chris, > > Isn't this dependent upon the workload submitted to the GuC? Meaning we have one workload that refused to be preempted (really long shader for example) but it went away on its own. Other workloads that come in later are preemptible. However, if we turn off preemption permanently, then all future workloads will not be preempted either which may not be desirable. > > Whoever implements the recovery mechanism can clear the flag. You may > like to clear the flag on reset? We would have to be more careful about > the manipulation of engine->flags as it's not serialised atm (since it's > _meant_ to be write-once during init). > -Chris The error that would occur here is a failure of GuC to *initiate* the preemption, and is different from a slow resolution of the preemption on hardware caused by the workload blocking scenario that Satyeshwar describes. GuC will wait forever for preemption resolution, as will i915 currently without a timeout mechanism. A failure of GuC to initiate a preemption would be a very strange and bad thing and probably would warrant a WARN and disabling. Is anyone actually seeing that with current firmware? I have not in my own testing. Is it an actual error returned from GuC or a timeout waiting for GuC response? -Jeff
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 133367a17863..24bdac205c45 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -588,6 +588,7 @@ static void inject_preempt_context(struct work_struct *work) data[6] = intel_guc_ggtt_offset(guc, guc->shared_data); if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) { + engine->flags &= ~I915_ENGINE_HAS_PREEMPTION; /* XXX racy! */ execlists_clear_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT); tasklet_schedule(&engine->execlists.tasklet);