[03/11] drm/i915/execlists: Suppress redundant preemption

Message ID	20190226102404.29153-3-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Tue, 26 Feb 2019 10:23:56 +0000 Message-Id: <20190226102404.29153-3-chris@chris-wilson.co.uk> In-Reply-To: <20190226102404.29153-1-chris@chris-wilson.co.uk> References: <20190226102404.29153-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/11] drm/i915/execlists: Suppress redundant preemption Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[01/11] drm/i915: Skip scanning for signalers if we are already inflight \| expand [01/11] drm/i915: Skip scanning for signalers if we are already inflight [02/11] drm/i915/execlists: Suppress mere WAIT preemption [03/11] drm/i915/execlists: Suppress redundant preemption [04/11] drm/i915: Make request allocation caches global [05/11] drm/i915: Introduce i915_timeline.mutex [06/11] drm/i915: Keep timeline HWSP allocated until idle across the system [07/11] drm/i915: Compute the global scheduler caps [08/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ [09/11] drm/i915: Prioritise non-busywait semaphore workloads [10/11] drm/i915/execlists: Skip direct submission if only lite-restore [11/11] drm/i915: Use __ffs() in for_each_priolist for more compact code

Chris Wilson Feb. 26, 2019, 10:23 a.m. UTC

On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!

v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

Tvrtko Ursulin Feb. 28, 2019, 1:11 p.m. UTC | #1

On 26/02/2019 10:23, Chris Wilson wrote:
> On unwinding the active request we give it a small (limited to internal
> priority levels) boost to prevent it from being gazumped a second time.
> However, this means that it can be promoted to above the request that
> triggered the preemption request, causing a preempt-to-idle cycle for no
> change. We can avoid this if we take the boost into account when
> checking if the preemption request is valid.
> 
> v2: After preemption the active request will be after the preemptee if
> they end up with equal priority.
> 
> v3: Tvrtko pointed out that this, the existing logic, makes
> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
> 
> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
> v5: Except not all priorities were made equal, and the WAIT not preempting
> is only if we start off as !NEWCLIENT.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
>   1 file changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0e20f3bc8210..dba19baf6808 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -164,6 +164,8 @@
>   #define WA_TAIL_DWORDS 2
>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>   
> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> +
>   static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   					    struct intel_engine_cs *engine,
>   					    struct intel_context *ce);
> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq)
>   
>   static int effective_prio(const struct i915_request *rq)
>   {
> +	int prio = rq_prio(rq);
> +
> +	/*
> +	 * On unwinding the active request, we give it a priority bump
> +	 * equivalent to a freshly submitted request. This protects it from
> +	 * being gazumped again, but it would be preferable if we didn't
> +	 * let it be gazumped in the first place!
> +	 *
> +	 * See __unwind_incomplete_requests()
> +	 */
> +	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
> +		/*
> +		 * After preemption, we insert the active request at the
> +		 * end of the new priority level. This means that we will be
> +		 * _lower_ priority than the preemptee all things equal (and
> +		 * so the preemption is valid), so adjust our comparison
> +		 * accordingly.
> +		 */
> +		prio |= ACTIVE_PRIORITY;
> +		prio--;
> +	}
> +
>   	/* Restrict mere WAIT boosts from triggering preemption */
> -	return rq_prio(rq) | __NO_PREEMPTION;
> +	return prio | __NO_PREEMPTION;
>   }
>   
>   static int queue_prio(const struct intel_engine_execlists *execlists)
> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   {
>   	struct i915_request *rq, *rn, *active = NULL;
>   	struct list_head *uninitialized_var(pl);
> -	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
> +	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
>   
>   	lockdep_assert_held(&engine->timeline.lock);
>   
> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   	 * The active request is now effectively the start of a new client
>   	 * stream, so give it the equivalent small priority bump to prevent
>   	 * it being gazumped a second time by another peer.
> +	 *
> +	 * One consequence of this preemption boost is that we may jump
> +	 * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
> +	 * making those priorities non-preemptible. They will be moved forward

After the previous patch wait priority is non-preemptible by definition 
making this suggestion preemption boost is making it so not accurate.

> +	 * in the priority queue, but they will not gain immediate access to
> +	 * the GPU.
>   	 */
> -	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
> -		prio |= I915_PRIORITY_NEWCLIENT;
> +	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {

What is the importance of the has_started check? Hasn't the active 
request been running by definition?

> +		prio |= ACTIVE_PRIORITY;
>   		active->sched.attr.priority = prio;
>   		list_move_tail(&active->sched.link,
>   			       i915_sched_lookup_priolist(engine, prio));
> 

Regards,

Tvrtko

Tvrtko Ursulin March 1, 2019, 11:31 a.m. UTC | #2

ping on below

On 28/02/2019 13:11, Tvrtko Ursulin wrote:
> 
> On 26/02/2019 10:23, Chris Wilson wrote:
>> On unwinding the active request we give it a small (limited to internal
>> priority levels) boost to prevent it from being gazumped a second time.
>> However, this means that it can be promoted to above the request that
>> triggered the preemption request, causing a preempt-to-idle cycle for no
>> change. We can avoid this if we take the boost into account when
>> checking if the preemption request is valid.
>>
>> v2: After preemption the active request will be after the preemptee if
>> they end up with equal priority.
>>
>> v3: Tvrtko pointed out that this, the existing logic, makes
>> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
>>
>> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
>> v5: Except not all priorities were made equal, and the WAIT not 
>> preempting
>> is only if we start off as !NEWCLIENT.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
>>   1 file changed, 34 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 0e20f3bc8210..dba19baf6808 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -164,6 +164,8 @@
>>   #define WA_TAIL_DWORDS 2
>>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
>> +
>>   static int execlists_context_deferred_alloc(struct i915_gem_context 
>> *ctx,
>>                           struct intel_engine_cs *engine,
>>                           struct intel_context *ce);
>> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct 
>> i915_request *rq)
>>   static int effective_prio(const struct i915_request *rq)
>>   {
>> +    int prio = rq_prio(rq);
>> +
>> +    /*
>> +     * On unwinding the active request, we give it a priority bump
>> +     * equivalent to a freshly submitted request. This protects it from
>> +     * being gazumped again, but it would be preferable if we didn't
>> +     * let it be gazumped in the first place!
>> +     *
>> +     * See __unwind_incomplete_requests()
>> +     */
>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
>> +        /*
>> +         * After preemption, we insert the active request at the
>> +         * end of the new priority level. This means that we will be
>> +         * _lower_ priority than the preemptee all things equal (and
>> +         * so the preemption is valid), so adjust our comparison
>> +         * accordingly.
>> +         */
>> +        prio |= ACTIVE_PRIORITY;
>> +        prio--;
>> +    }
>> +
>>       /* Restrict mere WAIT boosts from triggering preemption */
>> -    return rq_prio(rq) | __NO_PREEMPTION;
>> +    return prio | __NO_PREEMPTION;
>>   }
>>   static int queue_prio(const struct intel_engine_execlists *execlists)
>> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct 
>> intel_engine_cs *engine)
>>   {
>>       struct i915_request *rq, *rn, *active = NULL;
>>       struct list_head *uninitialized_var(pl);
>> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
>> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
>>       lockdep_assert_held(&engine->timeline.lock);
>> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct 
>> intel_engine_cs *engine)
>>        * The active request is now effectively the start of a new client
>>        * stream, so give it the equivalent small priority bump to prevent
>>        * it being gazumped a second time by another peer.
>> +     *
>> +     * One consequence of this preemption boost is that we may jump
>> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
>> +     * making those priorities non-preemptible. They will be moved 
>> forward
> 
> After the previous patch wait priority is non-preemptible by definition 
> making this suggestion preemption boost is making it so not accurate.
> 
>> +     * in the priority queue, but they will not gain immediate access to
>> +     * the GPU.
>>        */
>> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
>> -        prio |= I915_PRIORITY_NEWCLIENT;
>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
> 
> What is the importance of the has_started check? Hasn't the active 
> request been running by definition?
> 
>> +        prio |= ACTIVE_PRIORITY;
>>           active->sched.attr.priority = prio;
>>           list_move_tail(&active->sched.link,
>>                      i915_sched_lookup_priolist(engine, prio));
>>
> 
> Regards,
> 
> Tvrtko

Chris Wilson March 1, 2019, 11:36 a.m. UTC | #3

Quoting Tvrtko Ursulin (2019-03-01 11:31:26)
> 
> ping on below
> 
> On 28/02/2019 13:11, Tvrtko Ursulin wrote:
> > 
> > On 26/02/2019 10:23, Chris Wilson wrote:
> >> On unwinding the active request we give it a small (limited to internal
> >> priority levels) boost to prevent it from being gazumped a second time.
> >> However, this means that it can be promoted to above the request that
> >> triggered the preemption request, causing a preempt-to-idle cycle for no
> >> change. We can avoid this if we take the boost into account when
> >> checking if the preemption request is valid.
> >>
> >> v2: After preemption the active request will be after the preemptee if
> >> they end up with equal priority.
> >>
> >> v3: Tvrtko pointed out that this, the existing logic, makes
> >> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
> >>
> >> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
> >> v5: Except not all priorities were made equal, and the WAIT not 
> >> preempting
> >> is only if we start off as !NEWCLIENT.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
> >>   1 file changed, 34 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> >> b/drivers/gpu/drm/i915/intel_lrc.c
> >> index 0e20f3bc8210..dba19baf6808 100644
> >> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >> @@ -164,6 +164,8 @@
> >>   #define WA_TAIL_DWORDS 2
> >>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
> >> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> >> +
> >>   static int execlists_context_deferred_alloc(struct i915_gem_context 
> >> *ctx,
> >>                           struct intel_engine_cs *engine,
> >>                           struct intel_context *ce);
> >> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct 
> >> i915_request *rq)
> >>   static int effective_prio(const struct i915_request *rq)
> >>   {
> >> +    int prio = rq_prio(rq);
> >> +
> >> +    /*
> >> +     * On unwinding the active request, we give it a priority bump
> >> +     * equivalent to a freshly submitted request. This protects it from
> >> +     * being gazumped again, but it would be preferable if we didn't
> >> +     * let it be gazumped in the first place!
> >> +     *
> >> +     * See __unwind_incomplete_requests()
> >> +     */
> >> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
> >> +        /*
> >> +         * After preemption, we insert the active request at the
> >> +         * end of the new priority level. This means that we will be
> >> +         * _lower_ priority than the preemptee all things equal (and
> >> +         * so the preemption is valid), so adjust our comparison
> >> +         * accordingly.
> >> +         */
> >> +        prio |= ACTIVE_PRIORITY;
> >> +        prio--;
> >> +    }
> >> +
> >>       /* Restrict mere WAIT boosts from triggering preemption */
> >> -    return rq_prio(rq) | __NO_PREEMPTION;
> >> +    return prio | __NO_PREEMPTION;
> >>   }
> >>   static int queue_prio(const struct intel_engine_execlists *execlists)
> >> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct 
> >> intel_engine_cs *engine)
> >>   {
> >>       struct i915_request *rq, *rn, *active = NULL;
> >>       struct list_head *uninitialized_var(pl);
> >> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
> >> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
> >>       lockdep_assert_held(&engine->timeline.lock);
> >> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct 
> >> intel_engine_cs *engine)
> >>        * The active request is now effectively the start of a new client
> >>        * stream, so give it the equivalent small priority bump to prevent
> >>        * it being gazumped a second time by another peer.
> >> +     *
> >> +     * One consequence of this preemption boost is that we may jump
> >> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
> >> +     * making those priorities non-preemptible. They will be moved 
> >> forward
> > 
> > After the previous patch wait priority is non-preemptible by definition 
> > making this suggestion preemption boost is making it so not accurate.
> > 
> >> +     * in the priority queue, but they will not gain immediate access to
> >> +     * the GPU.
> >>        */
> >> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
> >> -        prio |= I915_PRIORITY_NEWCLIENT;
> >> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
> > 
> > What is the importance of the has_started check? Hasn't the active 
> > request been running by definition?

No. Semaphores. This is all about defending against incorrect promotion
while a request is still spinning on its dependencies (or else we get
promoted above them and PI is broken).
-Chris

Tvrtko Ursulin March 1, 2019, 3:07 p.m. UTC | #4

On 01/03/2019 11:36, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-01 11:31:26)
>>
>> ping on below
>>
>> On 28/02/2019 13:11, Tvrtko Ursulin wrote:
>>>
>>> On 26/02/2019 10:23, Chris Wilson wrote:
>>>> On unwinding the active request we give it a small (limited to internal
>>>> priority levels) boost to prevent it from being gazumped a second time.
>>>> However, this means that it can be promoted to above the request that
>>>> triggered the preemption request, causing a preempt-to-idle cycle for no
>>>> change. We can avoid this if we take the boost into account when
>>>> checking if the preemption request is valid.
>>>>
>>>> v2: After preemption the active request will be after the preemptee if
>>>> they end up with equal priority.
>>>>
>>>> v3: Tvrtko pointed out that this, the existing logic, makes
>>>> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
>>>>
>>>> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
>>>> v5: Except not all priorities were made equal, and the WAIT not
>>>> preempting
>>>> is only if we start off as !NEWCLIENT.
>>>>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
>>>>    1 file changed, 34 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
>>>> b/drivers/gpu/drm/i915/intel_lrc.c
>>>> index 0e20f3bc8210..dba19baf6808 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>>> @@ -164,6 +164,8 @@
>>>>    #define WA_TAIL_DWORDS 2
>>>>    #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>>>> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
>>>> +
>>>>    static int execlists_context_deferred_alloc(struct i915_gem_context
>>>> *ctx,
>>>>                            struct intel_engine_cs *engine,
>>>>                            struct intel_context *ce);
>>>> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct
>>>> i915_request *rq)
>>>>    static int effective_prio(const struct i915_request *rq)
>>>>    {
>>>> +    int prio = rq_prio(rq);
>>>> +
>>>> +    /*
>>>> +     * On unwinding the active request, we give it a priority bump
>>>> +     * equivalent to a freshly submitted request. This protects it from
>>>> +     * being gazumped again, but it would be preferable if we didn't
>>>> +     * let it be gazumped in the first place!
>>>> +     *
>>>> +     * See __unwind_incomplete_requests()
>>>> +     */
>>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
>>>> +        /*
>>>> +         * After preemption, we insert the active request at the
>>>> +         * end of the new priority level. This means that we will be
>>>> +         * _lower_ priority than the preemptee all things equal (and
>>>> +         * so the preemption is valid), so adjust our comparison
>>>> +         * accordingly.
>>>> +         */
>>>> +        prio |= ACTIVE_PRIORITY;
>>>> +        prio--;
>>>> +    }
>>>> +
>>>>        /* Restrict mere WAIT boosts from triggering preemption */
>>>> -    return rq_prio(rq) | __NO_PREEMPTION;
>>>> +    return prio | __NO_PREEMPTION;
>>>>    }
>>>>    static int queue_prio(const struct intel_engine_execlists *execlists)
>>>> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct
>>>> intel_engine_cs *engine)
>>>>    {
>>>>        struct i915_request *rq, *rn, *active = NULL;
>>>>        struct list_head *uninitialized_var(pl);
>>>> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
>>>> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
>>>>        lockdep_assert_held(&engine->timeline.lock);
>>>> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct
>>>> intel_engine_cs *engine)
>>>>         * The active request is now effectively the start of a new client
>>>>         * stream, so give it the equivalent small priority bump to prevent
>>>>         * it being gazumped a second time by another peer.
>>>> +     *
>>>> +     * One consequence of this preemption boost is that we may jump
>>>> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
>>>> +     * making those priorities non-preemptible. They will be moved
>>>> forward
>>>
>>> After the previous patch wait priority is non-preemptible by definition
>>> making this suggestion preemption boost is making it so not accurate.
>>>
>>>> +     * in the priority queue, but they will not gain immediate access to
>>>> +     * the GPU.
>>>>         */
>>>> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
>>>> -        prio |= I915_PRIORITY_NEWCLIENT;
>>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
>>>
>>> What is the importance of the has_started check? Hasn't the active
>>> request been running by definition?
> 
> No. Semaphores. This is all about defending against incorrect promotion
> while a request is still spinning on its dependencies (or else we get
> promoted above them and PI is broken).

Is init_breadcrumb after the semaphore, ie. __i915_request_has_started 
will be false while spinning on the semaphore. That possibly makes 
sense.. But you know what I'll say next. It is extremely subtle and 
sprinkled over the code so here we definitely need a comment explaining it.

Regards,

Tvrtko

Chris Wilson March 1, 2019, 3:14 p.m. UTC | #5

Quoting Tvrtko Ursulin (2019-03-01 15:07:58)
> 
> On 01/03/2019 11:36, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-01 11:31:26)
> >>
> >> ping on below
> >>
> >> On 28/02/2019 13:11, Tvrtko Ursulin wrote:
> >>>
> >>> On 26/02/2019 10:23, Chris Wilson wrote:
> >>>> On unwinding the active request we give it a small (limited to internal
> >>>> priority levels) boost to prevent it from being gazumped a second time.
> >>>> However, this means that it can be promoted to above the request that
> >>>> triggered the preemption request, causing a preempt-to-idle cycle for no
> >>>> change. We can avoid this if we take the boost into account when
> >>>> checking if the preemption request is valid.
> >>>>
> >>>> v2: After preemption the active request will be after the preemptee if
> >>>> they end up with equal priority.
> >>>>
> >>>> v3: Tvrtko pointed out that this, the existing logic, makes
> >>>> I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!
> >>>>
> >>>> v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
> >>>> v5: Except not all priorities were made equal, and the WAIT not
> >>>> preempting
> >>>> is only if we start off as !NEWCLIENT.
> >>>>
> >>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>> ---
> >>>>    drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
> >>>>    1 file changed, 34 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> >>>> b/drivers/gpu/drm/i915/intel_lrc.c
> >>>> index 0e20f3bc8210..dba19baf6808 100644
> >>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
> >>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>>> @@ -164,6 +164,8 @@
> >>>>    #define WA_TAIL_DWORDS 2
> >>>>    #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
> >>>> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> >>>> +
> >>>>    static int execlists_context_deferred_alloc(struct i915_gem_context
> >>>> *ctx,
> >>>>                            struct intel_engine_cs *engine,
> >>>>                            struct intel_context *ce);
> >>>> @@ -190,8 +192,30 @@ static inline int rq_prio(const struct
> >>>> i915_request *rq)
> >>>>    static int effective_prio(const struct i915_request *rq)
> >>>>    {
> >>>> +    int prio = rq_prio(rq);
> >>>> +
> >>>> +    /*
> >>>> +     * On unwinding the active request, we give it a priority bump
> >>>> +     * equivalent to a freshly submitted request. This protects it from
> >>>> +     * being gazumped again, but it would be preferable if we didn't
> >>>> +     * let it be gazumped in the first place!
> >>>> +     *
> >>>> +     * See __unwind_incomplete_requests()
> >>>> +     */
> >>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
> >>>> +        /*
> >>>> +         * After preemption, we insert the active request at the
> >>>> +         * end of the new priority level. This means that we will be
> >>>> +         * _lower_ priority than the preemptee all things equal (and
> >>>> +         * so the preemption is valid), so adjust our comparison
> >>>> +         * accordingly.
> >>>> +         */
> >>>> +        prio |= ACTIVE_PRIORITY;
> >>>> +        prio--;
> >>>> +    }
> >>>> +
> >>>>        /* Restrict mere WAIT boosts from triggering preemption */
> >>>> -    return rq_prio(rq) | __NO_PREEMPTION;
> >>>> +    return prio | __NO_PREEMPTION;
> >>>>    }
> >>>>    static int queue_prio(const struct intel_engine_execlists *execlists)
> >>>> @@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct
> >>>> intel_engine_cs *engine)
> >>>>    {
> >>>>        struct i915_request *rq, *rn, *active = NULL;
> >>>>        struct list_head *uninitialized_var(pl);
> >>>> -    int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
> >>>> +    int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
> >>>>        lockdep_assert_held(&engine->timeline.lock);
> >>>> @@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct
> >>>> intel_engine_cs *engine)
> >>>>         * The active request is now effectively the start of a new client
> >>>>         * stream, so give it the equivalent small priority bump to prevent
> >>>>         * it being gazumped a second time by another peer.
> >>>> +     *
> >>>> +     * One consequence of this preemption boost is that we may jump
> >>>> +     * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
> >>>> +     * making those priorities non-preemptible. They will be moved
> >>>> forward
> >>>
> >>> After the previous patch wait priority is non-preemptible by definition
> >>> making this suggestion preemption boost is making it so not accurate.
> >>>
> >>>> +     * in the priority queue, but they will not gain immediate access to
> >>>> +     * the GPU.
> >>>>         */
> >>>> -    if (!(prio & I915_PRIORITY_NEWCLIENT)) {
> >>>> -        prio |= I915_PRIORITY_NEWCLIENT;
> >>>> +    if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
> >>>
> >>> What is the importance of the has_started check? Hasn't the active
> >>> request been running by definition?
> > 
> > No. Semaphores. This is all about defending against incorrect promotion
> > while a request is still spinning on its dependencies (or else we get
> > promoted above them and PI is broken).
> 
> Is init_breadcrumb after the semaphore, ie. __i915_request_has_started 
> will be false while spinning on the semaphore. 

Yes, that's the raison d'etre for making init_breadcrumbs and
i915_request_has_started.

> That possibly makes 
> sense.. But you know what I'll say next. It is extremely subtle and 
> sprinkled over the code so here we definitely need a comment explaining it.

I've been trying, but it's all baked into the meaning of has-started and
active request. :| But I didn't extend the comment before
i915_request_has_started() to explain the bit about semaphores, so mea
culpa.
-Chris

[03/11] drm/i915/execlists: Suppress redundant preemption

Commit Message

Comments

Patch