diff mbox

[08/31] drm/i915: Move rate-limiting request retire to after submission

Message ID 20180625094842.8499-8-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson June 25, 2018, 9:48 a.m. UTC
Our long standing defense against a single client from flooding the
system with requests (causing mempressure and stalls across the whole
system) is to retire the old request on every allocation. (By retiring
the oldest, we try to keep returning requests back to the system in a
steady flow.) This adds an extra step into the submission path that we
can reduce simply by moving it to after the submission itself.

We already do try to clean up a stale request list after submission, so
always retiring all completed requests fits in as a natural extension.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

Comments

Tvrtko Ursulin June 27, 2018, 10:57 a.m. UTC | #1
On 25/06/2018 10:48, Chris Wilson wrote:
> Our long standing defense against a single client from flooding the
> system with requests (causing mempressure and stalls across the whole
> system) is to retire the old request on every allocation. (By retiring
> the oldest, we try to keep returning requests back to the system in a
> steady flow.) This adds an extra step into the submission path that we
> can reduce simply by moving it to after the submission itself.
> 
> We already do try to clean up a stale request list after submission, so
> always retiring all completed requests fits in as a natural extension.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 26 ++++++++++++++++++--------
>   1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index e1dbb544046f..e6e5eea87629 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -694,12 +694,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	if (ret)
>   		goto err_unreserve;
>   
> -	/* Move our oldest request to the slab-cache (if not in use!) */
> -	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
> -	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
> -	    i915_request_completed(rq))
> -		i915_request_retire(rq);
> -
>   	/*
>   	 * Beware: Dragons be flying overhead.
>   	 *
> @@ -1110,6 +1104,8 @@ void i915_request_add(struct i915_request *request)
>   	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>   
>   	/*
> +	 * Move our oldest requests to the slab-cache (if not in use!)
> +	 *
>   	 * In typical scenarios, we do not expect the previous request on
>   	 * the timeline to be still tracked by timeline->last_request if it
>   	 * has been completed. If the completed request is still here, that
> @@ -1126,8 +1122,22 @@ void i915_request_add(struct i915_request *request)
>   	 * work on behalf of others -- but instead we should benefit from
>   	 * improved resource management. (Well, that's the theory at least.)
>   	 */
> -	if (prev && i915_request_completed(prev))
> -		i915_request_retire_upto(prev);
> +	do {
> +		prev = list_first_entry(&ring->request_list,
> +					typeof(*prev), ring_link);
> +
> +		/*
> +		 * Keep the current request, the caller may not be
> +		 * expecting it to be retired (and freed!) immediately,
> +		 * and preserving one request from the client allows us to
> +		 * carry forward frequently reused state onto the next
> +		 * submission.
> +		 */
> +		if (prev == request || !i915_request_completed(prev))
> +			break;
> +
> +		i915_request_retire(prev);
> +	} while (1);

Maybe new helper i915_request_try_retire_upto(prev)?

>   }
>   
>   static unsigned long local_clock_us(unsigned int *cpu)
> 

Cost benefit? Is it really so interesting to keep tweaking this? I feel 
like I can stamp an r-b with the "yeah whatever" approach.. but the 
commit doesn't say what we gain to explain why it is useful to spend 
time reviewing it.

Regards,

Tvrtko
Chris Wilson June 27, 2018, 11:16 a.m. UTC | #2
Quoting Tvrtko Ursulin (2018-06-27 11:57:39)
> 
> On 25/06/2018 10:48, Chris Wilson wrote:
> > Our long standing defense against a single client from flooding the
> > system with requests (causing mempressure and stalls across the whole
> > system) is to retire the old request on every allocation. (By retiring
> > the oldest, we try to keep returning requests back to the system in a
> > steady flow.) This adds an extra step into the submission path that we
> > can reduce simply by moving it to after the submission itself.
> > 
> > We already do try to clean up a stale request list after submission, so
> > always retiring all completed requests fits in as a natural extension.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 26 ++++++++++++++++++--------
> >   1 file changed, 18 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index e1dbb544046f..e6e5eea87629 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -694,12 +694,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >       if (ret)
> >               goto err_unreserve;
> >   
> > -     /* Move our oldest request to the slab-cache (if not in use!) */
> > -     rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
> > -     if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
> > -         i915_request_completed(rq))
> > -             i915_request_retire(rq);
> > -
> >       /*
> >        * Beware: Dragons be flying overhead.
> >        *
> > @@ -1110,6 +1104,8 @@ void i915_request_add(struct i915_request *request)
> >       local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
> >   
> >       /*
> > +      * Move our oldest requests to the slab-cache (if not in use!)
> > +      *
> >        * In typical scenarios, we do not expect the previous request on
> >        * the timeline to be still tracked by timeline->last_request if it
> >        * has been completed. If the completed request is still here, that
> > @@ -1126,8 +1122,22 @@ void i915_request_add(struct i915_request *request)
> >        * work on behalf of others -- but instead we should benefit from
> >        * improved resource management. (Well, that's the theory at least.)
> >        */
> > -     if (prev && i915_request_completed(prev))
> > -             i915_request_retire_upto(prev);
> > +     do {
> > +             prev = list_first_entry(&ring->request_list,
> > +                                     typeof(*prev), ring_link);
> > +
> > +             /*
> > +              * Keep the current request, the caller may not be
> > +              * expecting it to be retired (and freed!) immediately,
> > +              * and preserving one request from the client allows us to
> > +              * carry forward frequently reused state onto the next
> > +              * submission.
> > +              */
> > +             if (prev == request || !i915_request_completed(prev))
> > +                     break;
> > +
> > +             i915_request_retire(prev);
> > +     } while (1);
> 
> Maybe new helper i915_request_try_retire_upto(prev)?

try_retire_before() I'm just feeling confusion in the name. Not yet
sold, and certainly don't want to invite more users :)

> >   static unsigned long local_clock_us(unsigned int *cpu)
> > 
> 
> Cost benefit? Is it really so interesting to keep tweaking this? I feel 
> like I can stamp an r-b with the "yeah whatever" approach.. but the 
> commit doesn't say what we gain to explain why it is useful to spend 
> time reviewing it.

The true cost was the contention the earlier retirement was causing with
the still inflight ELSP. The cost of that contention is less with the
current series, but the implication was made.
-Chris
Tvrtko Ursulin June 27, 2018, 1:28 p.m. UTC | #3
On 27/06/2018 12:16, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-06-27 11:57:39)
>>
>> On 25/06/2018 10:48, Chris Wilson wrote:
>>> Our long standing defense against a single client from flooding the
>>> system with requests (causing mempressure and stalls across the whole
>>> system) is to retire the old request on every allocation. (By retiring
>>> the oldest, we try to keep returning requests back to the system in a
>>> steady flow.) This adds an extra step into the submission path that we
>>> can reduce simply by moving it to after the submission itself.
>>>
>>> We already do try to clean up a stale request list after submission, so
>>> always retiring all completed requests fits in as a natural extension.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 26 ++++++++++++++++++--------
>>>    1 file changed, 18 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index e1dbb544046f..e6e5eea87629 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -694,12 +694,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>>>        if (ret)
>>>                goto err_unreserve;
>>>    
>>> -     /* Move our oldest request to the slab-cache (if not in use!) */
>>> -     rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
>>> -     if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
>>> -         i915_request_completed(rq))
>>> -             i915_request_retire(rq);
>>> -
>>>        /*
>>>         * Beware: Dragons be flying overhead.
>>>         *
>>> @@ -1110,6 +1104,8 @@ void i915_request_add(struct i915_request *request)
>>>        local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
>>>    
>>>        /*
>>> +      * Move our oldest requests to the slab-cache (if not in use!)
>>> +      *
>>>         * In typical scenarios, we do not expect the previous request on
>>>         * the timeline to be still tracked by timeline->last_request if it
>>>         * has been completed. If the completed request is still here, that
>>> @@ -1126,8 +1122,22 @@ void i915_request_add(struct i915_request *request)
>>>         * work on behalf of others -- but instead we should benefit from
>>>         * improved resource management. (Well, that's the theory at least.)
>>>         */
>>> -     if (prev && i915_request_completed(prev))
>>> -             i915_request_retire_upto(prev);
>>> +     do {
>>> +             prev = list_first_entry(&ring->request_list,
>>> +                                     typeof(*prev), ring_link);
>>> +
>>> +             /*
>>> +              * Keep the current request, the caller may not be
>>> +              * expecting it to be retired (and freed!) immediately,
>>> +              * and preserving one request from the client allows us to
>>> +              * carry forward frequently reused state onto the next
>>> +              * submission.
>>> +              */
>>> +             if (prev == request || !i915_request_completed(prev))
>>> +                     break;
>>> +
>>> +             i915_request_retire(prev);
>>> +     } while (1);
>>
>> Maybe new helper i915_request_try_retire_upto(prev)?
> 
> try_retire_before() I'm just feeling confusion in the name. Not yet
> sold, and certainly don't want to invite more users :)

It would be local so what's there to lose.

>>>    static unsigned long local_clock_us(unsigned int *cpu)
>>>
>>
>> Cost benefit? Is it really so interesting to keep tweaking this? I feel
>> like I can stamp an r-b with the "yeah whatever" approach.. but the
>> commit doesn't say what we gain to explain why it is useful to spend
>> time reviewing it.
> 
> The true cost was the contention the earlier retirement was causing with
> the still inflight ELSP. The cost of that contention is less with the
> current series, but the implication was made.

What kind of contention? On the timeline lock? How can it be less to 
contention to retire all completed requests versus only the oldest and 
previous?

Regards,

Tvrtko
Chris Wilson June 27, 2018, 1:37 p.m. UTC | #4
Quoting Tvrtko Ursulin (2018-06-27 14:28:08)
> 
> On 27/06/2018 12:16, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-06-27 11:57:39)
> >> Cost benefit? Is it really so interesting to keep tweaking this? I feel
> >> like I can stamp an r-b with the "yeah whatever" approach.. but the
> >> commit doesn't say what we gain to explain why it is useful to spend
> >> time reviewing it.
> > 
> > The true cost was the contention the earlier retirement was causing with
> > the still inflight ELSP. The cost of that contention is less with the
> > current series, but the implication was made.
> 
> What kind of contention? On the timeline lock? How can it be less to 
> contention to retire all completed requests versus only the oldest and 
> previous?

The problem is how frequently we couldn't retire the oldest, it started
off with a spinwait for the CSB event (since we had the breadcrumb, knew
an arbitration point was coming up, it shouldn't take long, right?).
Yes, in general, contention on the timeline.lock is something to be
concerned about as it shows up frequently on the throughput profiles
(and I'm already looking at how we might split the queue into its own
lock to see if that helps). But I also have to make sure I weight
latency just as heavily as, if not more, throughput.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e1dbb544046f..e6e5eea87629 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -694,12 +694,6 @@  i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	if (ret)
 		goto err_unreserve;
 
-	/* Move our oldest request to the slab-cache (if not in use!) */
-	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
-	if (!list_is_last(&rq->ring_link, &ce->ring->request_list) &&
-	    i915_request_completed(rq))
-		i915_request_retire(rq);
-
 	/*
 	 * Beware: Dragons be flying overhead.
 	 *
@@ -1110,6 +1104,8 @@  void i915_request_add(struct i915_request *request)
 	local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
 
 	/*
+	 * Move our oldest requests to the slab-cache (if not in use!)
+	 *
 	 * In typical scenarios, we do not expect the previous request on
 	 * the timeline to be still tracked by timeline->last_request if it
 	 * has been completed. If the completed request is still here, that
@@ -1126,8 +1122,22 @@  void i915_request_add(struct i915_request *request)
 	 * work on behalf of others -- but instead we should benefit from
 	 * improved resource management. (Well, that's the theory at least.)
 	 */
-	if (prev && i915_request_completed(prev))
-		i915_request_retire_upto(prev);
+	do {
+		prev = list_first_entry(&ring->request_list,
+					typeof(*prev), ring_link);
+
+		/*
+		 * Keep the current request, the caller may not be
+		 * expecting it to be retired (and freed!) immediately,
+		 * and preserving one request from the client allows us to
+		 * carry forward frequently reused state onto the next
+		 * submission.
+		 */
+		if (prev == request || !i915_request_completed(prev))
+			break;
+
+		i915_request_retire(prev);
+	} while (1);
 }
 
 static unsigned long local_clock_us(unsigned int *cpu)