diff mbox

[05/71] drm/i915/execlists: Disable submission tasklets when rescheduling

Message ID 20180503063757.22238-5-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson May 3, 2018, 6:36 a.m. UTC
As we reschedule the requests, we do not want the submission tasklet
running until we finish updating the priority chains. (We start
rewriting priorities from the oldest, but the dequeue looks at the most
recent in-flight, so there is a small race condition where dequeue may
decide that preemption is falsely required.) Combine the tasklet kicking
from adding a new request with the set-wedge protection so that we only
have to adjust the preempt-counter once to achieve both goals.

Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
 drivers/gpu/drm/i915/i915_request.c | 5 +----
 2 files changed, 3 insertions(+), 6 deletions(-)

Comments

Tvrtko Ursulin May 3, 2018, 5:49 p.m. UTC | #1
On 03/05/2018 07:36, Chris Wilson wrote:
> As we reschedule the requests, we do not want the submission tasklet
> running until we finish updating the priority chains. (We start
> rewriting priorities from the oldest, but the dequeue looks at the most
> recent in-flight, so there is a small race condition where dequeue may
> decide that preemption is falsely required.) Combine the tasklet kicking
> from adding a new request with the set-wedge protection so that we only
> have to adjust the preempt-counter once to achieve both goals.
> 
> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
>   drivers/gpu/drm/i915/i915_request.c | 5 +----
>   2 files changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5ece6ae4bdff..03cd30001b5d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
>   	rq = to_request(fence);
>   	engine = rq->engine;
>   
> -	rcu_read_lock();
> +	local_bh_disable(); /* RCU serialisation for set-wedged protection */
>   	if (engine->schedule)
>   		engine->schedule(rq, attr);
> -	rcu_read_unlock();
> +	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
>   }
>   
>   static void fence_set_priority(struct dma_fence *fence,
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 75061f9e48eb..0756fafa7f81 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>   	 * decide whether to preempt the entire chain so that it is ready to
>   	 * run at the earliest possible convenience.
>   	 */
> -	rcu_read_lock();
> +	local_bh_disable();
>   	if (engine->schedule)
>   		engine->schedule(request, &request->ctx->sched);
> -	rcu_read_unlock();
> -
> -	local_bh_disable();
>   	i915_sw_fence_commit(&request->submit);
>   	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
> 

AFAICS this doesn't disable tasklets running on other CPUs in parallel, 
on different engines, so they still may see the non-atomic (wrt 
schedule) snapshot of the submission queues. So I am not sure what it 
means. It prevents a tasklet from interrupt the schedule of this request 
- but as I said, I am not sure of the benefit.

Regards,

Tvrtko
Chris Wilson May 3, 2018, 7:50 p.m. UTC | #2
Quoting Tvrtko Ursulin (2018-05-03 18:49:27)
> 
> On 03/05/2018 07:36, Chris Wilson wrote:
> > As we reschedule the requests, we do not want the submission tasklet
> > running until we finish updating the priority chains. (We start
> > rewriting priorities from the oldest, but the dequeue looks at the most
> > recent in-flight, so there is a small race condition where dequeue may
> > decide that preemption is falsely required.) Combine the tasklet kicking
> > from adding a new request with the set-wedge protection so that we only
> > have to adjust the preempt-counter once to achieve both goals.
> > 
> > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
> >   drivers/gpu/drm/i915/i915_request.c | 5 +----
> >   2 files changed, 3 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 5ece6ae4bdff..03cd30001b5d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
> >       rq = to_request(fence);
> >       engine = rq->engine;
> >   
> > -     rcu_read_lock();
> > +     local_bh_disable(); /* RCU serialisation for set-wedged protection */
> >       if (engine->schedule)
> >               engine->schedule(rq, attr);
> > -     rcu_read_unlock();
> > +     local_bh_enable(); /* kick the tasklets if queues were reprioritised */
> >   }
> >   
> >   static void fence_set_priority(struct dma_fence *fence,
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 75061f9e48eb..0756fafa7f81 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
> >        * decide whether to preempt the entire chain so that it is ready to
> >        * run at the earliest possible convenience.
> >        */
> > -     rcu_read_lock();
> > +     local_bh_disable();
> >       if (engine->schedule)
> >               engine->schedule(request, &request->ctx->sched);
> > -     rcu_read_unlock();
> > -
> > -     local_bh_disable();
> >       i915_sw_fence_commit(&request->submit);
> >       local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
> >   
> > 
> 
> AFAICS this doesn't disable tasklets running on other CPUs in parallel, 
> on different engines, so they still may see the non-atomic (wrt 
> schedule) snapshot of the submission queues. So I am not sure what it 
> means. It prevents a tasklet from interrupt the schedule of this request 
> - but as I said, I am not sure of the benefit.

That was my "oh bother" comment as well. We don't realise the benefit of
ensuring that we always process the entire chain before a concurrent
tasklet starts processing the update; but we do coalesce the double
preempt-counter manipulation into one, and only **actually** kick the
tasklet when rescheduling a page flip rather than be forced to wait to
ksoftird. tasklets are only fast-pathed if scheduled from interrupt
context!
-Chris
Tvrtko Ursulin May 4, 2018, 9:15 a.m. UTC | #3
On 03/05/2018 20:50, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-05-03 18:49:27)
>>
>> On 03/05/2018 07:36, Chris Wilson wrote:
>>> As we reschedule the requests, we do not want the submission tasklet
>>> running until we finish updating the priority chains. (We start
>>> rewriting priorities from the oldest, but the dequeue looks at the most
>>> recent in-flight, so there is a small race condition where dequeue may
>>> decide that preemption is falsely required.) Combine the tasklet kicking
>>> from adding a new request with the set-wedge protection so that we only
>>> have to adjust the preempt-counter once to achieve both goals.
>>>
>>> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c     | 4 ++--
>>>    drivers/gpu/drm/i915/i915_request.c | 5 +----
>>>    2 files changed, 3 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 5ece6ae4bdff..03cd30001b5d 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence,
>>>        rq = to_request(fence);
>>>        engine = rq->engine;
>>>    
>>> -     rcu_read_lock();
>>> +     local_bh_disable(); /* RCU serialisation for set-wedged protection */
>>>        if (engine->schedule)
>>>                engine->schedule(rq, attr);
>>> -     rcu_read_unlock();
>>> +     local_bh_enable(); /* kick the tasklets if queues were reprioritised */
>>>    }
>>>    
>>>    static void fence_set_priority(struct dma_fence *fence,
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index 75061f9e48eb..0756fafa7f81 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches)
>>>         * decide whether to preempt the entire chain so that it is ready to
>>>         * run at the earliest possible convenience.
>>>         */
>>> -     rcu_read_lock();
>>> +     local_bh_disable();
>>>        if (engine->schedule)
>>>                engine->schedule(request, &request->ctx->sched);
>>> -     rcu_read_unlock();
>>> -
>>> -     local_bh_disable();
>>>        i915_sw_fence_commit(&request->submit);
>>>        local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>>>    
>>>
>>
>> AFAICS this doesn't disable tasklets running on other CPUs in parallel,
>> on different engines, so they still may see the non-atomic (wrt
>> schedule) snapshot of the submission queues. So I am not sure what it
>> means. It prevents a tasklet from interrupt the schedule of this request
>> - but as I said, I am not sure of the benefit.
> 
> That was my "oh bother" comment as well. We don't realise the benefit of
> ensuring that we always process the entire chain before a concurrent
> tasklet starts processing the update; but we do coalesce the double
> preempt-counter manipulation into one, and only **actually** kick the
> tasklet when rescheduling a page flip rather than be forced to wait to
> ksoftird. tasklets are only fast-pathed if scheduled from interrupt
> context!

As far as I understand it, the hunk in __fence_set_priority may help 
triggering preemption quicker, rather than on the next csb or submit 
activity - so makes some sense.

I say some because it would be a balancing question between the time 
taken to re-schedule vs alternative to just kick the tasklet after 
reschedule is done outside the softirq disable section.

The second hunk is a bit more difficult. It creates a longer softirq-off 
section, which is a slight negative, and I am unsure how much it 
actually closes the race with tasklets in practice. So it may be the 
only benefit is to reduce fiddles of both preempt count and 
local_bh_disable, to fiddle just one.

Can I ask for a patch split? :)

Regards,

Tvrtko
Chris Wilson May 4, 2018, 9:31 a.m. UTC | #4
Quoting Tvrtko Ursulin (2018-05-04 10:15:20)
> On 03/05/2018 20:50, Chris Wilson wrote:
> The second hunk is a bit more difficult. It creates a longer softirq-off 
> section, which is a slight negative, and I am unsure how much it 
> actually closes the race with tasklets in practice. So it may be the 
> only benefit is to reduce fiddles of both preempt count and 
> local_bh_disable, to fiddle just one.

Remember that local_bh_disable() is just a fiddle with preempt-count and
that the local tasklet doesn't run during the !preemptible section
anyway. So it does remove one preemption point (in clearing the
preempt-count) but that is of dubious merit since we would then need to
kick the submission tasklet again immediately (from the preempt-count
pov) afterwards.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ece6ae4bdff..03cd30001b5d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -578,10 +578,10 @@  static void __fence_set_priority(struct dma_fence *fence,
 	rq = to_request(fence);
 	engine = rq->engine;
 
-	rcu_read_lock();
+	local_bh_disable(); /* RCU serialisation for set-wedged protection */
 	if (engine->schedule)
 		engine->schedule(rq, attr);
-	rcu_read_unlock();
+	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
 static void fence_set_priority(struct dma_fence *fence,
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 75061f9e48eb..0756fafa7f81 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1109,12 +1109,9 @@  void __i915_request_add(struct i915_request *request, bool flush_caches)
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
-	rcu_read_lock();
+	local_bh_disable();
 	if (engine->schedule)
 		engine->schedule(request, &request->ctx->sched);
-	rcu_read_unlock();
-
-	local_bh_disable();
 	i915_sw_fence_commit(&request->submit);
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */