diff mbox

[v2,2/7] drm/i915: Disable tasklet scheduling across initial scheduling

Message ID 20180507135731.10587-2-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 7, 2018, 1:57 p.m. UTC
During request submission, we call the engine->schedule() function so
that we may reorder the active requests as required for inheriting the
new request's priority. This may schedule several tasklets to run on the
local CPU, but we will need to schedule the tasklets again for the new
request. Delay all the local tasklets until the end, so that we only
have to process the queue just once.

v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
superset of rcu_read_lock().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Tvrtko Ursulin May 8, 2018, 10:02 a.m. UTC | #1
On 07/05/2018 14:57, Chris Wilson wrote:
> During request submission, we call the engine->schedule() function so
> that we may reorder the active requests as required for inheriting the
> new request's priority. This may schedule several tasklets to run on the
> local CPU, but we will need to schedule the tasklets again for the new
> request. Delay all the local tasklets until the end, so that we only
> have to process the queue just once.
> 
> v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
> superset of rcu_read_lock().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 5 ++---
>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index e4cf76ec14a6..f336942229cf 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1110,12 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>   	 * decide whether to preempt the entire chain so that it is ready to
>   	 * run at the earliest possible convenience.
>   	 */
> -	rcu_read_lock();
> +	local_bh_disable();
> +	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
>   	if (engine->schedule)
>   		engine->schedule(request, &request->ctx->sched);
>   	rcu_read_unlock();
> -
> -	local_bh_disable();
>   	i915_sw_fence_commit(&request->submit);
>   	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
> 

Before I wasn't sure about this one, since it lengthens the softirq off 
section and still doesn't make the re-schedule atomic. But today I am 
thinking that in normal use dependency chains should not be that deep 
for it to become an issue. In the light of that, go for it:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
Chris Wilson May 8, 2018, 10:31 a.m. UTC | #2
Quoting Tvrtko Ursulin (2018-05-08 11:02:38)
> 
> On 07/05/2018 14:57, Chris Wilson wrote:
> > During request submission, we call the engine->schedule() function so
> > that we may reorder the active requests as required for inheriting the
> > new request's priority. This may schedule several tasklets to run on the
> > local CPU, but we will need to schedule the tasklets again for the new
> > request. Delay all the local tasklets until the end, so that we only
> > have to process the queue just once.
> > 
> > v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a
> > superset of rcu_read_lock().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 5 ++---
> >   1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index e4cf76ec14a6..f336942229cf 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1110,12 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
> >        * decide whether to preempt the entire chain so that it is ready to
> >        * run at the earliest possible convenience.
> >        */
> > -     rcu_read_lock();
> > +     local_bh_disable();
> > +     rcu_read_lock(); /* RCU serialisation for set-wedged protection */
> >       if (engine->schedule)
> >               engine->schedule(request, &request->ctx->sched);
> >       rcu_read_unlock();
> > -
> > -     local_bh_disable();
> >       i915_sw_fence_commit(&request->submit);
> >       local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
> >   
> > 
> 
> Before I wasn't sure about this one, since it lengthens the softirq off 
> section and still doesn't make the re-schedule atomic. But today I am 
> thinking that in normal use dependency chains should not be that deep 
> for it to become an issue. In the light of that, go for it:

We have big problems when the dependency chain grows, simply because we
recursively build a linear list and then walk it again. It's not the
worst nightmare in the code, and I don't think anyone has complained
about it yet, but we do have a few pathological igt (gem_exec_schedule
deep/wide) to show that it can be the ratelimiting step.

As far as blocking tasklets across this; only the local tasklet is
blocked and for good reason as we reorder the queues. A tasklet on
another CPU is also (mostly) blocked by the spinlock (and local irq
off). Overall I don't think the problem is exacerbated at all by the
local_bh_disable().
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e4cf76ec14a6..f336942229cf 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1110,12 +1110,11 @@  void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
-	rcu_read_lock();
+	local_bh_disable();
+	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
 		engine->schedule(request, &request->ctx->sched);
 	rcu_read_unlock();
-
-	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */