diff mbox series

[1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission

Message ID 20200324120718.977-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915/execlists: Pull tasklet interrupt-bh local to direct submission | expand

Commit Message

Chris Wilson March 24, 2020, 12:07 p.m. UTC
We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Tvrtko Ursulin March 24, 2020, 4:04 p.m. UTC | #1
On 24/03/2020 12:07, Chris Wilson wrote:
> We dropped calling process_csb prior to handling direct submission in
> order to avoid the nesting of spinlocks and lift process_csb() and the
> majority of the tasklet out of irq-off. However, we do want to avoid
> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> local to direct submission if we can acquire the tasklet's lock.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 210f60e14ef4..82dee2141b46 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
>   	if (reset_in_progress(execlists))
>   		return; /* defer until we restart the engine following reset */
>   
> +	/* Hopefully we clear execlists->pending[] to let us through */
> +	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet))

Does access to pending needs a READ_ONCE?

  {
> +		process_csb(engine);
> +		tasklet_unlock(&execlists->tasklet);
> +	}
> +
>   	__execlists_submission_tasklet(engine);
>   }
>   
> 

__execlists_submission_tasklet does check with READ_ONCE.

I think locking is fine, given how normal flow is tasklet -> irqsave 
engine lock, and here we have the reverse, but a trylock.

Regards,

Tvrtko
Tvrtko Ursulin March 24, 2020, 4:11 p.m. UTC | #2
On 24/03/2020 12:07, Chris Wilson wrote:
> We dropped calling process_csb prior to handling direct submission in
> order to avoid the nesting of spinlocks and lift process_csb() and the
> majority of the tasklet out of irq-off. However, we do want to avoid
> ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> local to direct submission if we can acquire the tasklet's lock.

Oh and important question - who benefits and how much?

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 210f60e14ef4..82dee2141b46 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
>   	if (reset_in_progress(execlists))
>   		return; /* defer until we restart the engine following reset */
>   
> +	/* Hopefully we clear execlists->pending[] to let us through */
> +	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
> +		process_csb(engine);
> +		tasklet_unlock(&execlists->tasklet);
> +	}
> +
>   	__execlists_submission_tasklet(engine);
>   }
>   
>
Chris Wilson March 24, 2020, 4:28 p.m. UTC | #3
Quoting Tvrtko Ursulin (2020-03-24 16:04:47)
> 
> On 24/03/2020 12:07, Chris Wilson wrote:
> > We dropped calling process_csb prior to handling direct submission in
> > order to avoid the nesting of spinlocks and lift process_csb() and the
> > majority of the tasklet out of irq-off. However, we do want to avoid
> > ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> > local to direct submission if we can acquire the tasklet's lock.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 210f60e14ef4..82dee2141b46 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -2891,6 +2891,12 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
> >       if (reset_in_progress(execlists))
> >               return; /* defer until we restart the engine following reset */
> >   
> > +     /* Hopefully we clear execlists->pending[] to let us through */
> > +     if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet))
> 
> Does access to pending needs a READ_ONCE?

Yes, we are peeking outside of the tasklet.

>   {
> > +             process_csb(engine);
> > +             tasklet_unlock(&execlists->tasklet);
> > +     }
> > +
> >       __execlists_submission_tasklet(engine);
> >   }
> >   
> > 
> 
> __execlists_submission_tasklet does check with READ_ONCE.
> 
> I think locking is fine, given how normal flow is tasklet -> irqsave 
> engine lock, and here we have the reverse, but a trylock.

And inside process_csb() we don't take the active lock, so we just need
to serialise calls to process_csb(). We don't strictly rely on it as
it's just an optimisation for latency so we can trylock and not worry
about not making forward progress -- the tasklet *running* but waiting
on the execlists lock is sufficient to ensure that a dequeue will happen
after this point if we can't do it ourselves.
-Chris
Chris Wilson March 24, 2020, 4:32 p.m. UTC | #4
Quoting Tvrtko Ursulin (2020-03-24 16:11:10)
> 
> On 24/03/2020 12:07, Chris Wilson wrote:
> > We dropped calling process_csb prior to handling direct submission in
> > order to avoid the nesting of spinlocks and lift process_csb() and the
> > majority of the tasklet out of irq-off. However, we do want to avoid
> > ksoftirqd latency in the fast path, so try and pull the interrupt-bh
> > local to direct submission if we can acquire the tasklet's lock.
> 
> Oh and important question - who benefits and how much?

Anything that doesn't want to be deferred to a tasklet like wsim, where
it helps fix a small regression in tip. That and avoiding the worker
where we didn't use to have one. Did not see a difference in syslatency,
so that's still a bone of contention for direction submission.
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 210f60e14ef4..82dee2141b46 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2891,6 +2891,12 @@  static void __submit_queue_imm(struct intel_engine_cs *engine)
 	if (reset_in_progress(execlists))
 		return; /* defer until we restart the engine following reset */
 
+	/* Hopefully we clear execlists->pending[] to let us through */
+	if (execlists->pending[0] && tasklet_trylock(&execlists->tasklet)) {
+		process_csb(engine);
+		tasklet_unlock(&execlists->tasklet);
+	}
+
 	__execlists_submission_tasklet(engine);
 }