diff mbox

[RESEND,4/6] drm/i915/execlists: Disable submission tasklet upon wedging

Message ID 20180716080332.32283-4-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson July 16, 2018, 8:03 a.m. UTC
If we declare the driver wedged before the GPU truly is, then we may see
the GPU complete some CS events following our cancellation. This leaves
us quite confused as we deleted all the bookkeeping and thus complain
about the inconsistent state.

We can just ignore the remaining events and let the GPU idle by not
feeding it, and so avoid trying to racily overwrite shared state. We
rely on there being a full GPU reset before unwedging, giving us the
opportunity to reset the shared state.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Tvrtko Ursulin July 16, 2018, 9:59 a.m. UTC | #1
On 16/07/2018 09:03, Chris Wilson wrote:
> If we declare the driver wedged before the GPU truly is, then we may see
> the GPU complete some CS events following our cancellation. This leaves
> us quite confused as we deleted all the bookkeeping and thus complain
> about the inconsistent state.
> 
> We can just ignore the remaining events and let the GPU idle by not
> feeding it, and so avoid trying to racily overwrite shared state. We
> rely on there being a full GPU reset before unwedging, giving us the
> opportunity to reset the shared state.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107188
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 4ef4439ff438..2c0050bab71e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -811,6 +811,11 @@ static void reset_csb_pointers(struct intel_engine_execlists *execlists)
>   	WRITE_ONCE(*execlists->csb_write, execlists->csb_write_reset);
>   }
>   
> +static void nop_submission_tasklet(unsigned long data)
> +{
> +	/* The driver is wedged; don't process any more events. */
> +}
> +
>   static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -871,6 +876,9 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   	execlists->queue = RB_ROOT_CACHED;
>   	GEM_BUG_ON(port_isset(execlists->port));
>   
> +	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
> +	execlists->tasklet.func = nop_submission_tasklet;
> +
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I am a bit uncertain whether we are theoretically safe without any 
memory barriers in or after intel_engines_reset_default_submission. 
Concern being that we reset and re-enable everything and someone still 
see the no_submission_tasklet as the set one. There's a lot of code in 
between so hopefully it is not possible.

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4ef4439ff438..2c0050bab71e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -811,6 +811,11 @@  static void reset_csb_pointers(struct intel_engine_execlists *execlists)
 	WRITE_ONCE(*execlists->csb_write, execlists->csb_write_reset);
 }
 
+static void nop_submission_tasklet(unsigned long data)
+{
+	/* The driver is wedged; don't process any more events. */
+}
+
 static void execlists_cancel_requests(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -871,6 +876,9 @@  static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	execlists->queue = RB_ROOT_CACHED;
 	GEM_BUG_ON(port_isset(execlists->port));
 
+	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
+	execlists->tasklet.func = nop_submission_tasklet;
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }