diff mbox series

[CI] drm/i915/execlists: Workaround switching back to a complete context

Message ID 20200327201433.21864-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [CI] drm/i915/execlists: Workaround switching back to a complete context | expand

Commit Message

Chris Wilson March 27, 2020, 8:14 p.m. UTC
In what seems remarkably similar to the w/a required to not reload an
idle context with HEAD==TAIL, it appears we must prevent the HW from
switching to an idle context in ELSP[1], while simultaneously trying to
preempt the HW to run another context and a continuation of the idle
context (which is no longer idle).

We can achieve this by preventing the context from completing while we
reload a new ELSP (by applying ring_set_paused(1) across the whole of
dequeue), except this eventually fails due to a lite-restore into a
waiting semaphore does not generate an ACK. Instead, we try to avoid
making the GPU do anything too challenging and not submit a new ELSP
while the interrupts + CSB events appear to have fallen behind the
completed contexts. We expect it to catch up shortly so we queue another
tasklet execution and hope for the best.

Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Comments

Mika Kuoppala March 27, 2020, 8:33 p.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> In what seems remarkably similar to the w/a required to not reload an
> idle context with HEAD==TAIL, it appears we must prevent the HW from
> switching to an idle context in ELSP[1], while simultaneously trying to
> preempt the HW to run another context and a continuation of the idle
> context (which is no longer idle).
>
> We can achieve this by preventing the context from completing while we
> reload a new ELSP (by applying ring_set_paused(1) across the whole of
> dequeue), except this eventually fails due to a lite-restore into a
> waiting semaphore does not generate an ACK. Instead, we try to avoid
> making the GPU do anything too challenging and not submit a new ELSP
> while the interrupts + CSB events appear to have fallen behind the
> completed contexts. We expect it to catch up shortly so we queue another
> tasklet execution and hope for the best.
>
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index b12355048501..5f17ece07858 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1915,11 +1915,26 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  	 * of trouble.
>  	 */
>  	active = READ_ONCE(execlists->active);
> -	while ((last = *active) && i915_request_completed(last))
> -		active++;
>  
> -	if (last) {
> +	/*
> +	 * In theory we can skip over completed contexts that have not
> +	 * yet been processed by events (as those events are in flight):
> +	 *
> +	 * while ((last = *active) && i915_request_completed(last))
> +	 *	active++;
> +	 *
> +	 * However, the GPU is cannot handle this as it will ultimately

s/is//

I applaud the straightforward nature of this compared to the pausing.
Albeit this seems to have a cost. 

But this should be quite rare event comparatively?

> +	 * find itself trying to jump back into a context it has just
> +	 * completed and barf.
> +	 */
> +
> +	if ((last = *active)) {
>  		if (need_preempt(engine, last, rb)) {
> +			if (i915_request_completed(last)) {
> +				tasklet_hi_schedule(&execlists->tasklet);
> +				return;
> +			}
> +

I was pondering of the lost tracing and if you can
work it backwards to this condition.

But I really hope this nails it,
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

>  			ENGINE_TRACE(engine,
>  				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
>  				     last->fence.context,
> @@ -1947,6 +1962,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>  			last = NULL;
>  		} else if (need_timeslice(engine, last) &&
>  			   timer_expired(&engine->execlists.timer)) {
> +			if (i915_request_completed(last)) {
> +				tasklet_hi_schedule(&execlists->tasklet);
> +				return;
> +			}
> +
>  			ENGINE_TRACE(engine,
>  				     "expired last=%llx:%lld, prio=%d, hint=%d\n",
>  				     last->fence.context,
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson March 27, 2020, 8:42 p.m. UTC | #2
Quoting Mika Kuoppala (2020-03-27 20:33:29)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > In what seems remarkably similar to the w/a required to not reload an
> > idle context with HEAD==TAIL, it appears we must prevent the HW from
> > switching to an idle context in ELSP[1], while simultaneously trying to
> > preempt the HW to run another context and a continuation of the idle
> > context (which is no longer idle).
> >
> > We can achieve this by preventing the context from completing while we
> > reload a new ELSP (by applying ring_set_paused(1) across the whole of
> > dequeue), except this eventually fails due to a lite-restore into a
> > waiting semaphore does not generate an ACK. Instead, we try to avoid
> > making the GPU do anything too challenging and not submit a new ELSP
> > while the interrupts + CSB events appear to have fallen behind the
> > completed contexts. We expect it to catch up shortly so we queue another
> > tasklet execution and hope for the best.
> >
> > Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/gt/intel_lrc.c | 26 +++++++++++++++++++++++---
> >  1 file changed, 23 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index b12355048501..5f17ece07858 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1915,11 +1915,26 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >        * of trouble.
> >        */
> >       active = READ_ONCE(execlists->active);
> > -     while ((last = *active) && i915_request_completed(last))
> > -             active++;
> >  
> > -     if (last) {
> > +     /*
> > +      * In theory we can skip over completed contexts that have not
> > +      * yet been processed by events (as those events are in flight):
> > +      *
> > +      * while ((last = *active) && i915_request_completed(last))
> > +      *      active++;
> > +      *
> > +      * However, the GPU is cannot handle this as it will ultimately
> 
> s/is//
> 
> I applaud the straightforward nature of this compared to the pausing.
> Albeit this seems to have a cost. 
> 
> But this should be quite rare event comparatively?

In the grand scheme of things, yes, it should only be short-circuiting
the interrupt delivery prior to direct submission or a preemption bump.
But since the request is complete, there should be nothing to stop the
context from completing, triggering the CSB event and sending the
interrupt. So it should be, one hopes, about 100ns at most behind.

> > +      * find itself trying to jump back into a context it has just
> > +      * completed and barf.
> > +      */
> > +
> > +     if ((last = *active)) {
> >               if (need_preempt(engine, last, rb)) {
> > +                     if (i915_request_completed(last)) {
> > +                             tasklet_hi_schedule(&execlists->tasklet);
> > +                             return;
> > +                     }
> > +
> 
> I was pondering of the lost tracing and if you can
> work it backwards to this condition.

We back out, but we will back again and will likely need to preempt
after we handle the next CSB event. So it's just "gpu is not ready yet,
no decision made" noise.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index b12355048501..5f17ece07858 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1915,11 +1915,26 @@  static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * of trouble.
 	 */
 	active = READ_ONCE(execlists->active);
-	while ((last = *active) && i915_request_completed(last))
-		active++;
 
-	if (last) {
+	/*
+	 * In theory we can skip over completed contexts that have not
+	 * yet been processed by events (as those events are in flight):
+	 *
+	 * while ((last = *active) && i915_request_completed(last))
+	 *	active++;
+	 *
+	 * However, the GPU is cannot handle this as it will ultimately
+	 * find itself trying to jump back into a context it has just
+	 * completed and barf.
+	 */
+
+	if ((last = *active)) {
 		if (need_preempt(engine, last, rb)) {
+			if (i915_request_completed(last)) {
+				tasklet_hi_schedule(&execlists->tasklet);
+				return;
+			}
+
 			ENGINE_TRACE(engine,
 				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
 				     last->fence.context,
@@ -1947,6 +1962,11 @@  static void execlists_dequeue(struct intel_engine_cs *engine)
 			last = NULL;
 		} else if (need_timeslice(engine, last) &&
 			   timer_expired(&engine->execlists.timer)) {
+			if (i915_request_completed(last)) {
+				tasklet_hi_schedule(&execlists->tasklet);
+				return;
+			}
+
 			ENGINE_TRACE(engine,
 				     "expired last=%llx:%lld, prio=%d, hint=%d\n",
 				     last->fence.context,