diff mbox

drm/i915/guc: Disable preemption if it fails

Message ID 20180531204717.29567-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 31, 2018, 8:47 p.m. UTC
If we fail to tell the GuC to perform preemption, we get stuck
attempting to continually retry inject_preempt_context() until we
eventually timeout and reset the GPU (approximately emitting the same
warning 1000 times). Bail after the first failure, emit the WARN and
stop trying to do any further preemption on this engine.

References: https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_2235/shard-apl4/igt@gem_exec_schedule@preempt-bsd.html
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Michałt Winiarski <michal.winiarski@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_submission.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Michel Thierry May 31, 2018, 9:16 p.m. UTC | #1
On 5/31/2018 1:47 PM, Chris Wilson wrote:
> If we fail to tell the GuC to perform preemption, we get stuck
> attempting to continually retry inject_preempt_context() until we
> eventually timeout and reset the GPU (approximately emitting the same
> warning 1000 times). Bail after the first failure, emit the WARN and
I only see 340 warnings in the 4 seconds before it timed out.

<7>[ ] [drm:intel_guc_send_mmio [i915]] INTEL_GUC_SEND: Action 0x2 
failed; ret=-110 status=0x00000002 response=0x40000000

The status is the same as the action, so something really bad happened 
inside there.

> stop trying to do any further preemption on this engine.
> 
> References: https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_2235/shard-apl4/igt@gem_exec_schedule@preempt-bsd.html
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Michałt Winiarski <michal.winiarski@intel.com>

Reviewed-by: Michel Thierry <michel.thierry@intel.com>

> ---
>   drivers/gpu/drm/i915/intel_guc_submission.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 133367a17863..24bdac205c45 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -588,6 +588,7 @@ static void inject_preempt_context(struct work_struct *work)
>   	data[6] = intel_guc_ggtt_offset(guc, guc->shared_data);
>   
>   	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
> +		engine->flags &= ~I915_ENGINE_HAS_PREEMPTION; /* XXX racy! */
>   		execlists_clear_active(&engine->execlists,
>   				       EXECLISTS_ACTIVE_PREEMPT);
>   		tasklet_schedule(&engine->execlists.tasklet);
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 133367a17863..24bdac205c45 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -588,6 +588,7 @@  static void inject_preempt_context(struct work_struct *work)
 	data[6] = intel_guc_ggtt_offset(guc, guc->shared_data);
 
 	if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) {
+		engine->flags &= ~I915_ENGINE_HAS_PREEMPTION; /* XXX racy! */
 		execlists_clear_active(&engine->execlists,
 				       EXECLISTS_ACTIVE_PREEMPT);
 		tasklet_schedule(&engine->execlists.tasklet);