diff mbox series

[10/11] drm/i915/execlists: Skip direct submission if only lite-restore

Message ID 20190226102404.29153-10-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [01/11] drm/i915: Skip scanning for signalers if we are already inflight | expand

Commit Message

Chris Wilson Feb. 26, 2019, 10:24 a.m. UTC
If we resubmitting the active context, simply skip the submission as
performing the submission from the interrupt handler has higher
throughput than continually provoking lite-restores. If however, we find
ourselves with a new client, we check whether or not we can dequeue into
the second port or to resolve preemption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

Comments

Tvrtko Ursulin Feb. 28, 2019, 1:20 p.m. UTC | #1
On 26/02/2019 10:24, Chris Wilson wrote:
> If we resubmitting the active context, simply skip the submission as

we are

> performing the submission from the interrupt handler has higher

 From the tasklet? You mean wait for ctx complete and then 
execlists_dequeue, instead of lite-restore?

Regards,

Tvrtko

> throughput than continually provoking lite-restores. If however, we find
> ourselves with a new client, we check whether or not we can dequeue into
> the second port or to resolve preemption.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++-----
>   1 file changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2268860cca44..3e9f7103f31f 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1205,12 +1205,26 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
>   		tasklet_hi_schedule(&execlists->tasklet);
>   }
>   
> -static void submit_queue(struct intel_engine_cs *engine, int prio)
> +static bool inflight(const struct intel_engine_execlists *execlists,
> +		     const struct i915_request *rq)
>   {
> -	if (prio > engine->execlists.queue_priority_hint) {
> -		engine->execlists.queue_priority_hint = prio;
> +	const struct i915_request *active = port_request(execlists->port);
> +
> +	return active && active->hw_context == rq->hw_context;
> +}
> +
> +static void submit_queue(struct intel_engine_cs *engine,
> +			 const struct i915_request *rq)
> +{
> +	struct intel_engine_execlists *execlists = &engine->execlists;
> +
> +	if (rq_prio(rq) <= execlists->queue_priority_hint)
> +		return;
> +
> +	execlists->queue_priority_hint = rq_prio(rq);
> +
> +	if (!inflight(execlists, rq))
>   		__submit_queue_imm(engine);
> -	}
>   }
>   
>   static void execlists_submit_request(struct i915_request *request)
> @@ -1226,7 +1240,7 @@ static void execlists_submit_request(struct i915_request *request)
>   	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
>   	GEM_BUG_ON(list_empty(&request->sched.link));
>   
> -	submit_queue(engine, rq_prio(request));
> +	submit_queue(engine, request);
>   
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>
Chris Wilson March 1, 2019, 10:22 a.m. UTC | #2
Quoting Tvrtko Ursulin (2019-02-28 13:20:50)
> 
> On 26/02/2019 10:24, Chris Wilson wrote:
> > If we resubmitting the active context, simply skip the submission as
> 
> we are

It took multiple reads to even notice that the third word was missing,
mainly because I kept skipping the "we".

> > performing the submission from the interrupt handler has higher
> 
>  From the tasklet?

Yes, the interrupt handler bottom half.

> You mean wait for ctx complete and then 
> execlists_dequeue, instead of lite-restore?

Yes. We can measure the relatively large impact of lite-restoring on
throughput (the delay in the GPU reloading the context is quite
noticeable), but it only affects a certain microbenchmark. In theory,
making sure we avoid the stall while handling the interrupt and
resubmitting should offset that cost and be much preferred for
multi-context situations, but there the interrupt overhead on an *idle*
system (~5-7us) is not as significant and so the impact seems not as
significant (in fact due to quirky hw, sometimes it is preferable not to
resubmit during an inflight CS event).

Anyway I just keep getting annoyed by the extra latency induced by
lite-restore without a good way to balance it against avoiding the CS
completion stall. And I live in fear of our tasklet being thrown to
ksoftirqd again.
-Chris
Chris Wilson March 1, 2019, 10:27 a.m. UTC | #3
Quoting Chris Wilson (2019-03-01 10:22:30)
> Quoting Tvrtko Ursulin (2019-02-28 13:20:50)
> > 
> > On 26/02/2019 10:24, Chris Wilson wrote:
> > > If we resubmitting the active context, simply skip the submission as
> > 
> > we are
> 
> It took multiple reads to even notice that the third word was missing,
> mainly because I kept skipping the "we".
> 
> > > performing the submission from the interrupt handler has higher
> > 
> >  From the tasklet?
> 
> Yes, the interrupt handler bottom half.
> 
> > You mean wait for ctx complete and then 
> > execlists_dequeue, instead of lite-restore?
> 
> Yes. We can measure the relatively large impact of lite-restoring on
> throughput (the delay in the GPU reloading the context is quite
> noticeable), but it only affects a certain microbenchmark. In theory,
> making sure we avoid the stall while handling the interrupt and
> resubmitting should offset that cost and be much preferred for
> multi-context situations, but there the interrupt overhead on an *idle*
> system (~5-7us) is not as significant and so the impact seems not as
> significant (in fact due to quirky hw, sometimes it is preferable not to
> resubmit during an inflight CS event).

On idle systems, that lite-restore cost is visible even on top of a
normal context switch, i.e. submitting
	A, AB, [CS interrupt], B 
is marginally slower than
	A, [CS interrupt], B
!!!

/o\
Sometimes I don't get our HW at all,
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2268860cca44..3e9f7103f31f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1205,12 +1205,26 @@  static void __submit_queue_imm(struct intel_engine_cs *engine)
 		tasklet_hi_schedule(&execlists->tasklet);
 }
 
-static void submit_queue(struct intel_engine_cs *engine, int prio)
+static bool inflight(const struct intel_engine_execlists *execlists,
+		     const struct i915_request *rq)
 {
-	if (prio > engine->execlists.queue_priority_hint) {
-		engine->execlists.queue_priority_hint = prio;
+	const struct i915_request *active = port_request(execlists->port);
+
+	return active && active->hw_context == rq->hw_context;
+}
+
+static void submit_queue(struct intel_engine_cs *engine,
+			 const struct i915_request *rq)
+{
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	if (rq_prio(rq) <= execlists->queue_priority_hint)
+		return;
+
+	execlists->queue_priority_hint = rq_prio(rq);
+
+	if (!inflight(execlists, rq))
 		__submit_queue_imm(engine);
-	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1226,7 +1240,7 @@  static void execlists_submit_request(struct i915_request *request)
 	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
-	submit_queue(engine, rq_prio(request));
+	submit_queue(engine, request);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }