diff mbox series

drm/i915/gt: Add a safety submission flush in the heartbeat

Message ID 20200615131120.18948-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915/gt: Add a safety submission flush in the heartbeat | expand

Commit Message

Chris Wilson June 15, 2020, 1:11 p.m. UTC
Just in case everything fails (like for example "missed interrupt
syndrome" on Sandybridge), always flush the submission tasklet from the
heartbeat. This papers over such issues, but will still appear as a
second long glitch, and prevents us from detecting it unless we happen
to be performing a timed test.

v2: We rely on flush_submission() synchronizing with the tasklet on
another CPU.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 23 +++++++++----------
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  3 +++
 2 files changed, 14 insertions(+), 12 deletions(-)

Comments

Mika Kuoppala June 15, 2020, 4:16 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Just in case everything fails (like for example "missed interrupt
> syndrome" on Sandybridge), always flush the submission tasklet from the
> heartbeat. This papers over such issues, but will still appear as a
> second long glitch, and prevents us from detecting it unless we happen
> to be performing a timed test.
>
> v2: We rely on flush_submission() synchronizing with the tasklet on
> another CPU.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c     | 23 +++++++++----------
>  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  3 +++
>  2 files changed, 14 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index d613cf31970c..31049e0bdb57 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1094,19 +1094,18 @@ void intel_engine_flush_submission(struct intel_engine_cs *engine)
>  {
>  	struct tasklet_struct *t = &engine->execlists.tasklet;
>  
> -	if (__tasklet_is_scheduled(t)) {
> -		local_bh_disable();
> -		if (tasklet_trylock(t)) {
> -			/* Must wait for any GPU reset in progress. */
> -			if (__tasklet_is_enabled(t))
> -				t->func(t->data);
> -			tasklet_unlock(t);
> -		}
> -		local_bh_enable();
> +	/* Synchronise and wait for the tasklet on another CPU */
> +	tasklet_kill(t);
> +
> +	/* Having cancelled the tasklet, ensure that is run */
> +	local_bh_disable();
> +	if (tasklet_trylock(t)) {
> +		/* Must wait for any GPU reset in progress. */
> +		if (__tasklet_is_enabled(t))

On heartbeat context this could be an assertion I think.

But it is difficult to enforce hw sanity and still please the
user.

They will curse on glitch, even tho thye are saved!

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

-Mika

> +			t->func(t->data);
> +		tasklet_unlock(t);
>  	}
> -
> -	/* Otherwise flush the tasklet if it was running on another cpu */
> -	tasklet_unlock_wait(t);
> +	local_bh_enable();
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index f67ad937eefb..cd20fb549b38 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -65,6 +65,9 @@ static void heartbeat(struct work_struct *wrk)
>  	struct intel_context *ce = engine->kernel_context;
>  	struct i915_request *rq;
>  
> +	/* Just in case everything has gone horribly wrong, give it a kick */
> +	intel_engine_flush_submission(engine);
> +
>  	rq = engine->heartbeat.systole;
>  	if (rq && i915_request_completed(rq)) {
>  		i915_request_put(rq);
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d613cf31970c..31049e0bdb57 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1094,19 +1094,18 @@  void intel_engine_flush_submission(struct intel_engine_cs *engine)
 {
 	struct tasklet_struct *t = &engine->execlists.tasklet;
 
-	if (__tasklet_is_scheduled(t)) {
-		local_bh_disable();
-		if (tasklet_trylock(t)) {
-			/* Must wait for any GPU reset in progress. */
-			if (__tasklet_is_enabled(t))
-				t->func(t->data);
-			tasklet_unlock(t);
-		}
-		local_bh_enable();
+	/* Synchronise and wait for the tasklet on another CPU */
+	tasklet_kill(t);
+
+	/* Having cancelled the tasklet, ensure that is run */
+	local_bh_disable();
+	if (tasklet_trylock(t)) {
+		/* Must wait for any GPU reset in progress. */
+		if (__tasklet_is_enabled(t))
+			t->func(t->data);
+		tasklet_unlock(t);
 	}
-
-	/* Otherwise flush the tasklet if it was running on another cpu */
-	tasklet_unlock_wait(t);
+	local_bh_enable();
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index f67ad937eefb..cd20fb549b38 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -65,6 +65,9 @@  static void heartbeat(struct work_struct *wrk)
 	struct intel_context *ce = engine->kernel_context;
 	struct i915_request *rq;
 
+	/* Just in case everything has gone horribly wrong, give it a kick */
+	intel_engine_flush_submission(engine);
+
 	rq = engine->heartbeat.systole;
 	if (rq && i915_request_completed(rq)) {
 		i915_request_put(rq);