Message ID | 20190226102404.29153-10-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [01/11] drm/i915: Skip scanning for signalers if we are already inflight | expand |
On 26/02/2019 10:24, Chris Wilson wrote: > If we resubmitting the active context, simply skip the submission as we are > performing the submission from the interrupt handler has higher From the tasklet? You mean wait for ctx complete and then execlists_dequeue, instead of lite-restore? Regards, Tvrtko > throughput than continually provoking lite-restores. If however, we find > ourselves with a new client, we check whether or not we can dequeue into > the second port or to resolve preemption. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 2268860cca44..3e9f7103f31f 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -1205,12 +1205,26 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > tasklet_hi_schedule(&execlists->tasklet); > } > > -static void submit_queue(struct intel_engine_cs *engine, int prio) > +static bool inflight(const struct intel_engine_execlists *execlists, > + const struct i915_request *rq) > { > - if (prio > engine->execlists.queue_priority_hint) { > - engine->execlists.queue_priority_hint = prio; > + const struct i915_request *active = port_request(execlists->port); > + > + return active && active->hw_context == rq->hw_context; > +} > + > +static void submit_queue(struct intel_engine_cs *engine, > + const struct i915_request *rq) > +{ > + struct intel_engine_execlists *execlists = &engine->execlists; > + > + if (rq_prio(rq) <= execlists->queue_priority_hint) > + return; > + > + execlists->queue_priority_hint = rq_prio(rq); > + > + if (!inflight(execlists, rq)) > __submit_queue_imm(engine); > - } > } > > static void execlists_submit_request(struct i915_request *request) > @@ -1226,7 +1240,7 @@ static void execlists_submit_request(struct i915_request *request) > GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); > GEM_BUG_ON(list_empty(&request->sched.link)); > > - submit_queue(engine, rq_prio(request)); > + submit_queue(engine, request); > > spin_unlock_irqrestore(&engine->timeline.lock, flags); > } >
Quoting Tvrtko Ursulin (2019-02-28 13:20:50) > > On 26/02/2019 10:24, Chris Wilson wrote: > > If we resubmitting the active context, simply skip the submission as > > we are It took multiple reads to even notice that the third word was missing, mainly because I kept skipping the "we". > > performing the submission from the interrupt handler has higher > > From the tasklet? Yes, the interrupt handler bottom half. > You mean wait for ctx complete and then > execlists_dequeue, instead of lite-restore? Yes. We can measure the relatively large impact of lite-restoring on throughput (the delay in the GPU reloading the context is quite noticeable), but it only affects a certain microbenchmark. In theory, making sure we avoid the stall while handling the interrupt and resubmitting should offset that cost and be much preferred for multi-context situations, but there the interrupt overhead on an *idle* system (~5-7us) is not as significant and so the impact seems not as significant (in fact due to quirky hw, sometimes it is preferable not to resubmit during an inflight CS event). Anyway I just keep getting annoyed by the extra latency induced by lite-restore without a good way to balance it against avoiding the CS completion stall. And I live in fear of our tasklet being thrown to ksoftirqd again. -Chris
Quoting Chris Wilson (2019-03-01 10:22:30) > Quoting Tvrtko Ursulin (2019-02-28 13:20:50) > > > > On 26/02/2019 10:24, Chris Wilson wrote: > > > If we resubmitting the active context, simply skip the submission as > > > > we are > > It took multiple reads to even notice that the third word was missing, > mainly because I kept skipping the "we". > > > > performing the submission from the interrupt handler has higher > > > > From the tasklet? > > Yes, the interrupt handler bottom half. > > > You mean wait for ctx complete and then > > execlists_dequeue, instead of lite-restore? > > Yes. We can measure the relatively large impact of lite-restoring on > throughput (the delay in the GPU reloading the context is quite > noticeable), but it only affects a certain microbenchmark. In theory, > making sure we avoid the stall while handling the interrupt and > resubmitting should offset that cost and be much preferred for > multi-context situations, but there the interrupt overhead on an *idle* > system (~5-7us) is not as significant and so the impact seems not as > significant (in fact due to quirky hw, sometimes it is preferable not to > resubmit during an inflight CS event). On idle systems, that lite-restore cost is visible even on top of a normal context switch, i.e. submitting A, AB, [CS interrupt], B is marginally slower than A, [CS interrupt], B !!! /o\ Sometimes I don't get our HW at all, -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 2268860cca44..3e9f7103f31f 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -1205,12 +1205,26 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) tasklet_hi_schedule(&execlists->tasklet); } -static void submit_queue(struct intel_engine_cs *engine, int prio) +static bool inflight(const struct intel_engine_execlists *execlists, + const struct i915_request *rq) { - if (prio > engine->execlists.queue_priority_hint) { - engine->execlists.queue_priority_hint = prio; + const struct i915_request *active = port_request(execlists->port); + + return active && active->hw_context == rq->hw_context; +} + +static void submit_queue(struct intel_engine_cs *engine, + const struct i915_request *rq) +{ + struct intel_engine_execlists *execlists = &engine->execlists; + + if (rq_prio(rq) <= execlists->queue_priority_hint) + return; + + execlists->queue_priority_hint = rq_prio(rq); + + if (!inflight(execlists, rq)) __submit_queue_imm(engine); - } } static void execlists_submit_request(struct i915_request *request) @@ -1226,7 +1240,7 @@ static void execlists_submit_request(struct i915_request *request) GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); GEM_BUG_ON(list_empty(&request->sched.link)); - submit_queue(engine, rq_prio(request)); + submit_queue(engine, request); spin_unlock_irqrestore(&engine->timeline.lock, flags); }
If we resubmitting the active context, simply skip the submission as performing the submission from the interrupt handler has higher throughput than continually provoking lite-restores. If however, we find ourselves with a new client, we check whether or not we can dequeue into the second port or to resolve preemption. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)