[09/11] drm/i915/execlists: Refactor out can_merge_rq()

Message ID	20190130021906.17933-9-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 30 Jan 2019 02:19:04 +0000 Message-Id: <20190130021906.17933-9-chris@chris-wilson.co.uk> In-Reply-To: <20190130021906.17933-1-chris@chris-wilson.co.uk> References: <20190130021906.17933-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/11] drm/i915/execlists: Refactor out can_merge_rq() Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[01/11] drm/i915: Revoke mmaps and prevent access to fence registers across reset \| expand [01/11] drm/i915: Revoke mmaps and prevent access to fence registers across reset [02/11] drm/i915/execlists: Suppress redundant preemption [03/11] drm/i915/selftests: Exercise some AB...BA preemption chains [04/11] drm/i915: Generalise GPU activity tracking [05/11] drm/i915: Add timeline barrier support [06/11] drm/i915: Allocate active tracking nodes from a slabcache [07/11] drm/i915: Pull i915_gem_active into the i915_active family [08/11] drm/i915: Keep timeline HWSP allocated until the system is idle [09/11] drm/i915/execlists: Refactor out can_merge_rq() [10/11] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ [11/11] drm/i915: Prioritise non-busywait semaphore workloads

Chris Wilson Jan. 30, 2019, 2:19 a.m. UTC

In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 30 +++++++++++++++++++-----------
 1 file changed, 19 insertions(+), 11 deletions(-)

Tvrtko Ursulin Jan. 30, 2019, 6:05 p.m. UTC | #1

On 30/01/2019 02:19, Chris Wilson wrote:
> In the next patch, we add another user that wants to check whether
> requests can be merge into a single HW execution, and in the future we
> want to add more conditions under which requests from the same context
> cannot be merge. In preparation, extract out can_merge_rq().
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 30 +++++++++++++++++++-----------
>   1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2616b0b3e8d5..e97ce54138d3 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   }
>   
>   __maybe_unused static inline bool
> -assert_priority_queue(const struct intel_engine_execlists *execlists,
> -		      const struct i915_request *prev,
> +assert_priority_queue(const struct i915_request *prev,
>   		      const struct i915_request *next)
>   {
> -	if (!prev)
> -		return true;
> +	const struct intel_engine_execlists *execlists =
> +		&prev->engine->execlists;
>   
>   	/*
>   	 * Without preemption, the prev may refer to the still active element
> @@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
>   	return true;
>   }
>   
> +static bool can_merge_rq(const struct i915_request *prev,
> +			 const struct i915_request *next)
> +{
> +	GEM_BUG_ON(!assert_priority_queue(prev, next));
> +
> +	if (!can_merge_ctx(prev->hw_context, next->hw_context))
> +		return false;
> +
> +	return true;

I'll assume you'll be adding here in the future as the reason this is 
not simply "return can_merge_ctx(...)"?

> +}
> +
>   static void port_assign(struct execlist_port *port, struct i915_request *rq)
>   {
>   	GEM_BUG_ON(rq == port_request(port));
> @@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		int i;
>   
>   		priolist_for_each_request_consume(rq, rn, p, i) {
> -			GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
> -
>   			/*
>   			 * Can we combine this request with the current port?
>   			 * It has to be the same context/ringbuffer and not
> @@ -766,8 +774,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			 * second request, and so we never need to tell the
>   			 * hardware about the first.
>   			 */
> -			if (last &&
> -			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
> +			if (last && !can_merge_rq(last, rq)) {
> +				if (last->hw_context == rq->hw_context)
> +					goto done;

I don't get this added check. AFAICS it will only trigger with GVT 
making it not consider filling both ports if possible.

> +
>   				/*
>   				 * If we are on the second port and cannot
>   				 * combine this request with the last, then we
> @@ -787,7 +797,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   				    ctx_single_port_submission(rq->hw_context))
>   					goto done;
>   
> -				GEM_BUG_ON(last->hw_context == rq->hw_context);

This is related to the previous comment. Rebase error?

>   
>   				if (submit)
>   					port_assign(port, last);
> @@ -827,8 +836,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * request triggering preemption on the next dequeue (or subsequent
>   	 * interrupt for secondary ports).
>   	 */
> -	execlists->queue_priority_hint =
> -		port != execlists->port ? rq_prio(last) : INT_MIN;
> +	execlists->queue_priority_hint = queue_prio(execlists);

This shouldn't be in this patch.

>   
>   	if (submit) {
>   		port_assign(port, last);
> 

Regards,

Tvrtko

Chris Wilson Jan. 30, 2019, 6:14 p.m. UTC | #2

Quoting Tvrtko Ursulin (2019-01-30 18:05:42)
> 
> On 30/01/2019 02:19, Chris Wilson wrote:
> > In the next patch, we add another user that wants to check whether
> > requests can be merge into a single HW execution, and in the future we
> > want to add more conditions under which requests from the same context
> > cannot be merge. In preparation, extract out can_merge_rq().
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 30 +++++++++++++++++++-----------
> >   1 file changed, 19 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 2616b0b3e8d5..e97ce54138d3 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
> >   }
> >   
> >   __maybe_unused static inline bool
> > -assert_priority_queue(const struct intel_engine_execlists *execlists,
> > -                   const struct i915_request *prev,
> > +assert_priority_queue(const struct i915_request *prev,
> >                     const struct i915_request *next)
> >   {
> > -     if (!prev)
> > -             return true;
> > +     const struct intel_engine_execlists *execlists =
> > +             &prev->engine->execlists;
> >   
> >       /*
> >        * Without preemption, the prev may refer to the still active element
> > @@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
> >       return true;
> >   }
> >   
> > +static bool can_merge_rq(const struct i915_request *prev,
> > +                      const struct i915_request *next)
> > +{
> > +     GEM_BUG_ON(!assert_priority_queue(prev, next));
> > +
> > +     if (!can_merge_ctx(prev->hw_context, next->hw_context))
> > +             return false;
> > +
> > +     return true;
> 
> I'll assume you'll be adding here in the future as the reason this is 
> not simply "return can_merge_ctx(...)"?

Yes, raison d'etre of making the change.

> >   static void port_assign(struct execlist_port *port, struct i915_request *rq)
> >   {
> >       GEM_BUG_ON(rq == port_request(port));
> > @@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               int i;
> >   
> >               priolist_for_each_request_consume(rq, rn, p, i) {
> > -                     GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
> > -
> >                       /*
> >                        * Can we combine this request with the current port?
> >                        * It has to be the same context/ringbuffer and not
> > @@ -766,8 +774,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                        * second request, and so we never need to tell the
> >                        * hardware about the first.
> >                        */
> > -                     if (last &&
> > -                         !can_merge_ctx(rq->hw_context, last->hw_context)) {
> > +                     if (last && !can_merge_rq(last, rq)) {
> > +                             if (last->hw_context == rq->hw_context)
> > +                                     goto done;
> 
> I don't get this added check. AFAICS it will only trigger with GVT 
> making it not consider filling both ports if possible.

Because we are preparing for can_merge_rq() deciding not to merge the
same context. If we do that we can't continue on to the next port and
must terminate the loop, violating the trick with the hint in the
process.

This changes due to the next patch, per-context freq and probably more
that I've forgotten.

> > +
> >                               /*
> >                                * If we are on the second port and cannot
> >                                * combine this request with the last, then we
> > @@ -787,7 +797,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                                   ctx_single_port_submission(rq->hw_context))
> >                                       goto done;
> >   
> > -                             GEM_BUG_ON(last->hw_context == rq->hw_context);
> 
> This is related to the previous comment. Rebase error?

Previous if check, so it's clear at this point that we can't be using
the same.

> > @@ -827,8 +836,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >        * request triggering preemption on the next dequeue (or subsequent
> >        * interrupt for secondary ports).
> >        */
> > -     execlists->queue_priority_hint =
> > -             port != execlists->port ? rq_prio(last) : INT_MIN;
> > +     execlists->queue_priority_hint = queue_prio(execlists);
> 
> This shouldn't be in this patch.

If we terminate the loop early, we need to look at the head of the
queue.
-Chris

Tvrtko Ursulin Jan. 31, 2019, 9:19 a.m. UTC | #3

On 30/01/2019 18:14, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-01-30 18:05:42)
>>
>> On 30/01/2019 02:19, Chris Wilson wrote:
>>> In the next patch, we add another user that wants to check whether
>>> requests can be merge into a single HW execution, and in the future we
>>> want to add more conditions under which requests from the same context
>>> cannot be merge. In preparation, extract out can_merge_rq().
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 30 +++++++++++++++++++-----------
>>>    1 file changed, 19 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 2616b0b3e8d5..e97ce54138d3 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>>>    }
>>>    
>>>    __maybe_unused static inline bool
>>> -assert_priority_queue(const struct intel_engine_execlists *execlists,
>>> -                   const struct i915_request *prev,
>>> +assert_priority_queue(const struct i915_request *prev,
>>>                      const struct i915_request *next)
>>>    {
>>> -     if (!prev)
>>> -             return true;
>>> +     const struct intel_engine_execlists *execlists =
>>> +             &prev->engine->execlists;
>>>    
>>>        /*
>>>         * Without preemption, the prev may refer to the still active element
>>> @@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
>>>        return true;
>>>    }
>>>    
>>> +static bool can_merge_rq(const struct i915_request *prev,
>>> +                      const struct i915_request *next)
>>> +{
>>> +     GEM_BUG_ON(!assert_priority_queue(prev, next));
>>> +
>>> +     if (!can_merge_ctx(prev->hw_context, next->hw_context))
>>> +             return false;
>>> +
>>> +     return true;
>>
>> I'll assume you'll be adding here in the future as the reason this is
>> not simply "return can_merge_ctx(...)"?
> 
> Yes, raison d'etre of making the change.
> 
>>>    static void port_assign(struct execlist_port *port, struct i915_request *rq)
>>>    {
>>>        GEM_BUG_ON(rq == port_request(port));
>>> @@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                int i;
>>>    
>>>                priolist_for_each_request_consume(rq, rn, p, i) {
>>> -                     GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
>>> -
>>>                        /*
>>>                         * Can we combine this request with the current port?
>>>                         * It has to be the same context/ringbuffer and not
>>> @@ -766,8 +774,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                         * second request, and so we never need to tell the
>>>                         * hardware about the first.
>>>                         */
>>> -                     if (last &&
>>> -                         !can_merge_ctx(rq->hw_context, last->hw_context)) {
>>> +                     if (last && !can_merge_rq(last, rq)) {
>>> +                             if (last->hw_context == rq->hw_context)
>>> +                                     goto done;
>>
>> I don't get this added check. AFAICS it will only trigger with GVT
>> making it not consider filling both ports if possible.
> 
> Because we are preparing for can_merge_rq() deciding not to merge the
> same context. If we do that we can't continue on to the next port and
> must terminate the loop, violating the trick with the hint in the
> process.
> 
> This changes due to the next patch, per-context freq and probably more
> that I've forgotten.

After a second look, I noticed the existing GVT comment a bit lower down 
which avoids populating port1 already.

Maybe one thing which would make sense is to re-arange these checks in 
the order of "priority", like:

	if (last && !can_merge_rq(...)) {
		// naturally highest prio since it is impossible
		if (port == last_port)
			goto done;
		// 2nd highest to account for programming limitation
		else if (last->hw_context == rq->hw_context)
			goto done;
		// GVT check simplified (I think - since we know last is either 
different ctx or single submit)
		else if (ctx_single_port_submission(rq->hw_context))
			goto done;
> 
>>> +
>>>                                /*
>>>                                 * If we are on the second port and cannot
>>>                                 * combine this request with the last, then we
>>> @@ -787,7 +797,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                                    ctx_single_port_submission(rq->hw_context))
>>>                                        goto done;
>>>    
>>> -                             GEM_BUG_ON(last->hw_context == rq->hw_context);
>>
>> This is related to the previous comment. Rebase error?
> 
> Previous if check, so it's clear at this point that we can't be using
> the same.

Yep.

> 
>>> @@ -827,8 +836,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>         * request triggering preemption on the next dequeue (or subsequent
>>>         * interrupt for secondary ports).
>>>         */
>>> -     execlists->queue_priority_hint =
>>> -             port != execlists->port ? rq_prio(last) : INT_MIN;
>>> +     execlists->queue_priority_hint = queue_prio(execlists);
>>
>> This shouldn't be in this patch.
> 
> If we terminate the loop early, we need to look at the head of the
> queue.

Why it is different for ending early for any other (existing) reason? 
Although I concede better management of queue_priority_hint is exactly 
what I was suggesting. Oops. Consequences are not entirely straight 
forward though.. if we decide not to submit all of a single context, or 
leave port1 empty, currently we would hint scheduling the tasklet for 
any new submission. With this change only after a CS or if a higher ctx 
is submitted. Which is what makes me feel it should be a separate patch 
for a behaviour change (since a high prio, higher than INT_MIN, is 
potentially head of the queue).

Regards,

Tvrtko

Chris Wilson Jan. 31, 2019, 9:30 a.m. UTC | #4

Quoting Tvrtko Ursulin (2019-01-31 09:19:18)
> 
> On 30/01/2019 18:14, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-30 18:05:42)
> >>
> >> On 30/01/2019 02:19, Chris Wilson wrote:
> >>> @@ -827,8 +836,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >>>         * request triggering preemption on the next dequeue (or subsequent
> >>>         * interrupt for secondary ports).
> >>>         */
> >>> -     execlists->queue_priority_hint =
> >>> -             port != execlists->port ? rq_prio(last) : INT_MIN;
> >>> +     execlists->queue_priority_hint = queue_prio(execlists);
> >>
> >> This shouldn't be in this patch.
> > 
> > If we terminate the loop early, we need to look at the head of the
> > queue.
> 
> Why it is different for ending early for any other (existing) reason? 
> Although I concede better management of queue_priority_hint is exactly 
> what I was suggesting. Oops. Consequences are not entirely straight 
> forward though.. if we decide not to submit all of a single context, or 
> leave port1 empty, currently we would hint scheduling the tasklet for 
> any new submission. With this change only after a CS or if a higher ctx 
> is submitted. Which is what makes me feel it should be a separate patch 
> for a behaviour change (since a high prio, higher than INT_MIN, is 
> potentially head of the queue).

Not quite. Previously if we saw port1 was empty it meant that last was
invalid and so the right choice was INT_MIN as the queue was empty. In
all other cases last is the first request in the priority list.

After this patch, we cannot draw the same conclusions from port1 being
empty, and nor can we directly inspect last. So to get the same result
as before the patch, we must actually look at the priority queue.
-Chris

Chris Wilson Jan. 31, 2019, 9:36 a.m. UTC | #5

Quoting Tvrtko Ursulin (2019-01-31 09:19:18)
> 
> On 30/01/2019 18:14, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-01-30 18:05:42)
> >>> @@ -766,8 +774,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >>>                         * second request, and so we never need to tell the
> >>>                         * hardware about the first.
> >>>                         */
> >>> -                     if (last &&
> >>> -                         !can_merge_ctx(rq->hw_context, last->hw_context)) {
> >>> +                     if (last && !can_merge_rq(last, rq)) {
> >>> +                             if (last->hw_context == rq->hw_context)
> >>> +                                     goto done;
> >>
> >> I don't get this added check. AFAICS it will only trigger with GVT
> >> making it not consider filling both ports if possible.
> > 
> > Because we are preparing for can_merge_rq() deciding not to merge the
> > same context. If we do that we can't continue on to the next port and
> > must terminate the loop, violating the trick with the hint in the
> > process.
> > 
> > This changes due to the next patch, per-context freq and probably more
> > that I've forgotten.
> 
> After a second look, I noticed the existing GVT comment a bit lower down 
> which avoids populating port1 already.
> 
> Maybe one thing which would make sense is to re-arange these checks in 
> the order of "priority", like:
> 
>         if (last && !can_merge_rq(...)) {
>                 // naturally highest prio since it is impossible
>                 if (port == last_port)
>                         goto done;
>                 // 2nd highest to account for programming limitation
>                 else if (last->hw_context == rq->hw_context)
>                         goto done;

I was tempted to pull the last_port and context checks together.

>                 // GVT check simplified (I think - since we know last is either 
> different ctx or single submit)
>                 else if (ctx_single_port_submission(rq->hw_context))
>                         goto done;

And that's what I think and I tried to get gvt to clarify that their
checks are excessive. And I'll keep on suggesting that they remove their
poking around inside the scheduler... :-p

But it's definitely something I want out of sight.
-Chris

[09/11] drm/i915/execlists: Refactor out can_merge_rq()

Commit Message

Comments

Patch