diff mbox

drm/i915/guc: Disable preemption if it fails

Message ID 95427A2E42F76A40B7520C17720417F092F04EAE@FMSMSX126.amr.corp.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Satyeshwar Singh May 31, 2018, 9:17 p.m. UTC
Hi Chris,
Isn't this dependent upon the workload submitted to the GuC? Meaning we have one workload that refused to be preempted (really long shader for example) but it went away on its own. Other workloads that come in later are preemptible. However, if we turn off preemption permanently, then all future workloads will not be preempted either which may not be desirable.
-Satyeshwar

-----Original Message-----
From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf Of Chris Wilson
Sent: Thursday, May 31, 2018 1:47 PM
To: intel-gfx@lists.freedesktop.org
Subject: [Intel-gfx] [PATCH] drm/i915/guc: Disable preemption if it fails

If we fail to tell the GuC to perform preemption, we get stuck attempting to continually retry inject_preempt_context() until we eventually timeout and reset the GPU (approximately emitting the same warning 1000 times). Bail after the first failure, emit the WARN and stop trying to do any further preemption on this engine.

References: https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_2235/shard-apl4/igt@gem_exec_schedule@preempt-bsd.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michałt Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 1 +
 1 file changed, 1 insertion(+)

--
2.17.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Comments

Chris Wilson May 31, 2018, 9:20 p.m. UTC | #1
Quoting Singh, Satyeshwar (2018-05-31 22:17:25)
> Hi Chris,
> Isn't this dependent upon the workload submitted to the GuC? Meaning we have one workload that refused to be preempted (really long shader for example) but it went away on its own. Other workloads that come in later are preemptible. However, if we turn off preemption permanently, then all future workloads will not be preempted either which may not be desirable.

Whoever implements the recovery mechanism can clear the flag. You may
like to clear the flag on reset? We would have to be more careful about
the manipulation of engine->flags as it's not serialised atm (since it's
_meant_ to be write-once during init).
-Chris
jeff.mcgee@intel.com June 2, 2018, 12:02 a.m. UTC | #2
On Thu, May 31, 2018 at 10:20:54PM +0100, Chris Wilson wrote:
> Quoting Singh, Satyeshwar (2018-05-31 22:17:25)
> > Hi Chris,
> > Isn't this dependent upon the workload submitted to the GuC? Meaning we have one workload that refused to be preempted (really long shader for example) but it went away on its own. Other workloads that come in later are preemptible. However, if we turn off preemption permanently, then all future workloads will not be preempted either which may not be desirable.
> 
> Whoever implements the recovery mechanism can clear the flag. You may
> like to clear the flag on reset? We would have to be more careful about
> the manipulation of engine->flags as it's not serialised atm (since it's
> _meant_ to be write-once during init).
> -Chris

The error that would occur here is a failure of GuC to *initiate* the
preemption, and is different from a slow resolution of the preemption on
hardware caused by the workload blocking scenario that Satyeshwar
describes. GuC will wait forever for preemption resolution, as will i915
currently without a timeout mechanism. A failure of GuC to initiate a
preemption would be a very strange and bad thing and probably would
warrant a WARN and disabling. Is anyone actually seeing that with current
firmware? I have not in my own testing. Is it an actual error returned
from GuC or a timeout waiting for GuC response?
-Jeff
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 133367a17863..24bdac205c45 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -588,6 +588,7 @@  static void inject_preempt_context(struct work_struct *work)
 	data[6] = intel_guc_ggtt_offset(guc, guc->shared_data);
 
 	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
+		engine->flags &= ~I915_ENGINE_HAS_PREEMPTION; /* XXX racy! */
 		execlists_clear_active(&engine->execlists,
 				       EXECLISTS_ACTIVE_PREEMPT);
 		tasklet_schedule(&engine->execlists.tasklet);