Message ID | 20200324120718.977-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission | expand |
On 24/03/2020 12:07, Chris Wilson wrote: > We dropped calling process_csb prior to handling direct submission in > order to avoid the nesting of spinlocks and lift process_csb() and the > majority of the tasklet out of irq-off. However, we do want to avoid > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > local to direct submission if we can acquire the tasklet's lock. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 210f60e14ef4..82dee2141b46 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > if (reset_in_progress(execlists)) > return; /* defer until we restart the engine following reset */ > > + /* Hopefully we clear execlists->pending[] to let us through */ > + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) Does access to pending needs a READ_ONCE? { > + process_csb(engine); > + tasklet_unlock(&execlists->tasklet); > + } > + > __execlists_submission_tasklet(engine); > } > > __execlists_submission_tasklet does check with READ_ONCE. I think locking is fine, given how normal flow is tasklet -> irqsave engine lock, and here we have the reverse, but a trylock. Regards, Tvrtko
On 24/03/2020 12:07, Chris Wilson wrote: > We dropped calling process_csb prior to handling direct submission in > order to avoid the nesting of spinlocks and lift process_csb() and the > majority of the tasklet out of irq-off. However, we do want to avoid > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > local to direct submission if we can acquire the tasklet's lock. Oh and important question - who benefits and how much? Regards, Tvrtko > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 210f60e14ef4..82dee2141b46 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > if (reset_in_progress(execlists)) > return; /* defer until we restart the engine following reset */ > > + /* Hopefully we clear execlists->pending[] to let us through */ > + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) { > + process_csb(engine); > + tasklet_unlock(&execlists->tasklet); > + } > + > __execlists_submission_tasklet(engine); > } > >
Quoting Tvrtko Ursulin (2020-03-24 16:04:47) > > On 24/03/2020 12:07, Chris Wilson wrote: > > We dropped calling process_csb prior to handling direct submission in > > order to avoid the nesting of spinlocks and lift process_csb() and the > > majority of the tasklet out of irq-off. However, we do want to avoid > > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > > local to direct submission if we can acquire the tasklet's lock. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 210f60e14ef4..82dee2141b46 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) > > if (reset_in_progress(execlists)) > > return; /* defer until we restart the engine following reset */ > > > > + /* Hopefully we clear execlists->pending[] to let us through */ > > + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) > > Does access to pending needs a READ_ONCE? Yes, we are peeking outside of the tasklet. > { > > + process_csb(engine); > > + tasklet_unlock(&execlists->tasklet); > > + } > > + > > __execlists_submission_tasklet(engine); > > } > > > > > > __execlists_submission_tasklet does check with READ_ONCE. > > I think locking is fine, given how normal flow is tasklet -> irqsave > engine lock, and here we have the reverse, but a trylock. And inside process_csb() we don't take the active lock, so we just need to serialise calls to process_csb(). We don't strictly rely on it as it's just an optimisation for latency so we can trylock and not worry about not making forward progress -- the tasklet *running* but waiting on the execlists lock is sufficient to ensure that a dequeue will happen after this point if we can't do it ourselves. -Chris
Quoting Tvrtko Ursulin (2020-03-24 16:11:10) > > On 24/03/2020 12:07, Chris Wilson wrote: > > We dropped calling process_csb prior to handling direct submission in > > order to avoid the nesting of spinlocks and lift process_csb() and the > > majority of the tasklet out of irq-off. However, we do want to avoid > > ksoftirqd latency in the fast path, so try and pull the interrupt-bh > > local to direct submission if we can acquire the tasklet's lock. > > Oh and important question - who benefits and how much? Anything that doesn't want to be deferred to a tasklet like wsim, where it helps fix a small regression in tip. That and avoiding the worker where we didn't use to have one. Did not see a difference in syslatency, so that's still a bone of contention for direction submission. -Chris
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 210f60e14ef4..82dee2141b46 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) if (reset_in_progress(execlists)) return; /* defer until we restart the engine following reset */ + /* Hopefully we clear execlists->pending[] to let us through */ + if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) { + process_csb(engine); + tasklet_unlock(&execlists->tasklet); + } + __execlists_submission_tasklet(engine); }
We dropped calling process_csb prior to handling direct submission in order to avoid the nesting of spinlocks and lift process_csb() and the majority of the tasklet out of irq-off. However, we do want to avoid ksoftirqd latency in the fast path, so try and pull the interrupt-bh local to direct submission if we can acquire the tasklet's lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> --- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++ 1 file changed, 6 insertions(+)