diff mbox series

[4/5] drm/i915/gt: Adopt engine_park synchronisation rules for engine_retire

Message ID 20191121135131.338984-4-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/5] drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request | expand

Commit Message

Chris Wilson Nov. 21, 2019, 1:51 p.m. UTC
In the next patch, we will introduce a new asynchronous retirement
worker, fed by execlists CS events. Here we may queue a retirement as
soon as a request is submitted to HW (and completes instantly), and we
also want to process that retirement as early as possible and cannot
afford to postpone (as there may not be another opportunity to retire it
for a few seconds). To allow the new async retirer to run in parallel
with our submission, pull the __i915_request_queue (that passes the
request to HW) inside the timelines spinlock so that the retirement
cannot release the timeline before we have completed the submission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++-------
 1 file changed, 21 insertions(+), 8 deletions(-)

Comments

Tvrtko Ursulin Nov. 21, 2019, 2:42 p.m. UTC | #1
On 21/11/2019 13:51, Chris Wilson wrote:
> In the next patch, we will introduce a new asynchronous retirement
> worker, fed by execlists CS events. Here we may queue a retirement as
> soon as a request is submitted to HW (and completes instantly), and we
> also want to process that retirement as early as possible and cannot
> afford to postpone (as there may not be another opportunity to retire it
> for a few seconds). To allow the new async retirer to run in parallel
> with our submission, pull the __i915_request_queue (that passes the
> request to HW) inside the timelines spinlock so that the retirement
> cannot release the timeline before we have completed the submission.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++-------
>   1 file changed, 21 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index 373a4b9f159c..bd0af02bea16 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
>   #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
>   
>   static void
> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl,
> -				      struct intel_engine_cs *engine)
> +__queue_and_release_pm(struct i915_request *rq,
> +		       struct intel_timeline *tl,
> +		       struct intel_engine_cs *engine)
>   {
>   	struct intel_gt_timelines *timelines = &engine->gt->timelines;
>   
> +	/*
> +	 * We have to serialise all potential retirement paths with our
> +	 * submission, as we don't want to underflow either the
> +	 * engine->wakeref.counter or our timeline->active_count.
> +	 *
> +	 * Equally, we cannot allow a new submission to start until
> +	 * after we finish queueing, nor could we allow that submitter
> +	 * to retire us before we are ready!
> +	 */
>   	spin_lock(&timelines->lock);
>   
> -	if (!atomic_fetch_inc(&tl->active_count))
> -		list_add_tail(&tl->link, &timelines->active_list);
> +	/* Hand the request over to HW and so engine_retire() */
> +	__i915_request_queue(rq, NULL);
>   
> +	/* Let new submissions commence (and maybe retire this timeline) */
>   	__intel_wakeref_defer_park(&engine->wakeref);
>   
> +	/* Let intel_gt_retire_requests() retire us */
> +	if (!atomic_fetch_inc(&tl->active_count))
> +		list_add_tail(&tl->link, &timelines->active_list);
> +
>   	spin_unlock(&timelines->lock);

Now that everything is under the lock the order of operation is not 
important, or it still is?

Regards,

Tvrtko

>   }
>   
> @@ -148,10 +163,8 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
>   	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
>   	__i915_request_commit(rq);
>   
> -	__i915_request_queue(rq, NULL);
> -
> -	/* Expose ourselves to intel_gt_retire_requests() and new submission */
> -	__intel_timeline_enter_and_release_pm(ce->timeline, engine);
> +	/* Expose ourselves to the world */
> +	__queue_and_release_pm(rq, ce->timeline, engine);
>   
>   	result = false;
>   out_unlock:
>
Chris Wilson Nov. 21, 2019, 2:53 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-11-21 14:42:56)
> 
> On 21/11/2019 13:51, Chris Wilson wrote:
> > In the next patch, we will introduce a new asynchronous retirement
> > worker, fed by execlists CS events. Here we may queue a retirement as
> > soon as a request is submitted to HW (and completes instantly), and we
> > also want to process that retirement as early as possible and cannot
> > afford to postpone (as there may not be another opportunity to retire it
> > for a few seconds). To allow the new async retirer to run in parallel
> > with our submission, pull the __i915_request_queue (that passes the
> > request to HW) inside the timelines spinlock so that the retirement
> > cannot release the timeline before we have completed the submission.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++-------
> >   1 file changed, 21 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > index 373a4b9f159c..bd0af02bea16 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> > @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
> >   #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
> >   
> >   static void
> > -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl,
> > -                                   struct intel_engine_cs *engine)
> > +__queue_and_release_pm(struct i915_request *rq,
> > +                    struct intel_timeline *tl,
> > +                    struct intel_engine_cs *engine)
> >   {
> >       struct intel_gt_timelines *timelines = &engine->gt->timelines;
> >   
> > +     /*
> > +      * We have to serialise all potential retirement paths with our
> > +      * submission, as we don't want to underflow either the
> > +      * engine->wakeref.counter or our timeline->active_count.
> > +      *
> > +      * Equally, we cannot allow a new submission to start until
> > +      * after we finish queueing, nor could we allow that submitter
> > +      * to retire us before we are ready!
> > +      */
> >       spin_lock(&timelines->lock);
> >   
> > -     if (!atomic_fetch_inc(&tl->active_count))
> > -             list_add_tail(&tl->link, &timelines->active_list);
> > +     /* Hand the request over to HW and so engine_retire() */
> > +     __i915_request_queue(rq, NULL);
> >   
> > +     /* Let new submissions commence (and maybe retire this timeline) */
> >       __intel_wakeref_defer_park(&engine->wakeref);
> >   
> > +     /* Let intel_gt_retire_requests() retire us */
> > +     if (!atomic_fetch_inc(&tl->active_count))
> > +             list_add_tail(&tl->link, &timelines->active_list);
> > +
> >       spin_unlock(&timelines->lock);
> 
> Now that everything is under the lock the order of operation is not 
> important, or it still is?

queue before unpark that is required.

unpark and add_to_timeline, the order is flexible as the lock governors
the interactions between those and retirers. So I chose to allow the
next newcomer start a few instructions earlier.
-Chris
Tvrtko Ursulin Nov. 21, 2019, 4:17 p.m. UTC | #3
On 21/11/2019 14:53, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-21 14:42:56)
>>
>> On 21/11/2019 13:51, Chris Wilson wrote:
>>> In the next patch, we will introduce a new asynchronous retirement
>>> worker, fed by execlists CS events. Here we may queue a retirement as
>>> soon as a request is submitted to HW (and completes instantly), and we
>>> also want to process that retirement as early as possible and cannot
>>> afford to postpone (as there may not be another opportunity to retire it
>>> for a few seconds). To allow the new async retirer to run in parallel
>>> with our submission, pull the __i915_request_queue (that passes the
>>> request to HW) inside the timelines spinlock so that the retirement
>>> cannot release the timeline before we have completed the submission.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++-------
>>>    1 file changed, 21 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> index 373a4b9f159c..bd0af02bea16 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
>>>    #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
>>>    
>>>    static void
>>> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl,
>>> -                                   struct intel_engine_cs *engine)
>>> +__queue_and_release_pm(struct i915_request *rq,
>>> +                    struct intel_timeline *tl,
>>> +                    struct intel_engine_cs *engine)
>>>    {
>>>        struct intel_gt_timelines *timelines = &engine->gt->timelines;
>>>    
>>> +     /*
>>> +      * We have to serialise all potential retirement paths with our
>>> +      * submission, as we don't want to underflow either the
>>> +      * engine->wakeref.counter or our timeline->active_count.
>>> +      *
>>> +      * Equally, we cannot allow a new submission to start until
>>> +      * after we finish queueing, nor could we allow that submitter
>>> +      * to retire us before we are ready!
>>> +      */
>>>        spin_lock(&timelines->lock);
>>>    
>>> -     if (!atomic_fetch_inc(&tl->active_count))
>>> -             list_add_tail(&tl->link, &timelines->active_list);
>>> +     /* Hand the request over to HW and so engine_retire() */
>>> +     __i915_request_queue(rq, NULL);
>>>    
>>> +     /* Let new submissions commence (and maybe retire this timeline) */
>>>        __intel_wakeref_defer_park(&engine->wakeref);
>>>    
>>> +     /* Let intel_gt_retire_requests() retire us */
>>> +     if (!atomic_fetch_inc(&tl->active_count))
>>> +             list_add_tail(&tl->link, &timelines->active_list);
>>> +
>>>        spin_unlock(&timelines->lock);
>>
>> Now that everything is under the lock the order of operation is not
>> important, or it still is?
> 
> queue before unpark that is required.
> 
> unpark and add_to_timeline, the order is flexible as the lock governors
> the interactions between those and retirers. So I chose to allow the
> next newcomer start a few instructions earlier.

Yes, because of different locks. So the comment above 
__intel_wakeref_defer_park is not correct since timeline cannot be 
retired until the lock is dropped.

It's only preservation of timeline ordering which mandates defer_park 
after request_queue. As far as I am able to summon my own understanding 
from yesterday.

Regards,

Tvrtko
Chris Wilson Nov. 21, 2019, 4:24 p.m. UTC | #4
Quoting Tvrtko Ursulin (2019-11-21 16:17:59)
> 
> On 21/11/2019 14:53, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-11-21 14:42:56)
> >>
> >> On 21/11/2019 13:51, Chris Wilson wrote:
> >>> In the next patch, we will introduce a new asynchronous retirement
> >>> worker, fed by execlists CS events. Here we may queue a retirement as
> >>> soon as a request is submitted to HW (and completes instantly), and we
> >>> also want to process that retirement as early as possible and cannot
> >>> afford to postpone (as there may not be another opportunity to retire it
> >>> for a few seconds). To allow the new async retirer to run in parallel
> >>> with our submission, pull the __i915_request_queue (that passes the
> >>> request to HW) inside the timelines spinlock so that the retirement
> >>> cannot release the timeline before we have completed the submission.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++-------
> >>>    1 file changed, 21 insertions(+), 8 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> >>> index 373a4b9f159c..bd0af02bea16 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> >>> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
> >>>    #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
> >>>    
> >>>    static void
> >>> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl,
> >>> -                                   struct intel_engine_cs *engine)
> >>> +__queue_and_release_pm(struct i915_request *rq,
> >>> +                    struct intel_timeline *tl,
> >>> +                    struct intel_engine_cs *engine)
> >>>    {
> >>>        struct intel_gt_timelines *timelines = &engine->gt->timelines;
> >>>    
> >>> +     /*
> >>> +      * We have to serialise all potential retirement paths with our
> >>> +      * submission, as we don't want to underflow either the
> >>> +      * engine->wakeref.counter or our timeline->active_count.
> >>> +      *
> >>> +      * Equally, we cannot allow a new submission to start until
> >>> +      * after we finish queueing, nor could we allow that submitter
> >>> +      * to retire us before we are ready!
> >>> +      */
> >>>        spin_lock(&timelines->lock);
> >>>    
> >>> -     if (!atomic_fetch_inc(&tl->active_count))
> >>> -             list_add_tail(&tl->link, &timelines->active_list);
> >>> +     /* Hand the request over to HW and so engine_retire() */
> >>> +     __i915_request_queue(rq, NULL);
> >>>    
> >>> +     /* Let new submissions commence (and maybe retire this timeline) */
> >>>        __intel_wakeref_defer_park(&engine->wakeref);
> >>>    
> >>> +     /* Let intel_gt_retire_requests() retire us */
> >>> +     if (!atomic_fetch_inc(&tl->active_count))
> >>> +             list_add_tail(&tl->link, &timelines->active_list);
> >>> +
> >>>        spin_unlock(&timelines->lock);
> >>
> >> Now that everything is under the lock the order of operation is not
> >> important, or it still is?
> > 
> > queue before unpark that is required.
> > 
> > unpark and add_to_timeline, the order is flexible as the lock governors
> > the interactions between those and retirers. So I chose to allow the
> > next newcomer start a few instructions earlier.
> 
> Yes, because of different locks. So the comment above 
> __intel_wakeref_defer_park is not correct since timeline cannot be 
> retired until the lock is dropped.

The goal was to indicate that the wakeref.count will allow new
submissions to bypass the engine-pm; while also tying back to the
retirement theme and reminding the reader that request submission also
implies some retiring of old requests on the timeline.

So I was trying to point out the connection between all steps and the
act of retiring, since that was most pressing on my mind.

> It's only preservation of timeline ordering which mandates defer_park 
> after request_queue. As far as I am able to summon my own understanding 
> from yesterday.

Correct. That's the important bit from yesterday.
-Chris
Tvrtko Ursulin Nov. 21, 2019, 4:31 p.m. UTC | #5
On 21/11/2019 16:24, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-11-21 16:17:59)
>>
>> On 21/11/2019 14:53, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-11-21 14:42:56)
>>>>
>>>> On 21/11/2019 13:51, Chris Wilson wrote:
>>>>> In the next patch, we will introduce a new asynchronous retirement
>>>>> worker, fed by execlists CS events. Here we may queue a retirement as
>>>>> soon as a request is submitted to HW (and completes instantly), and we
>>>>> also want to process that retirement as early as possible and cannot
>>>>> afford to postpone (as there may not be another opportunity to retire it
>>>>> for a few seconds). To allow the new async retirer to run in parallel
>>>>> with our submission, pull the __i915_request_queue (that passes the
>>>>> request to HW) inside the timelines spinlock so that the retirement
>>>>> cannot release the timeline before we have completed the submission.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++-------
>>>>>     1 file changed, 21 insertions(+), 8 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>>>> index 373a4b9f159c..bd0af02bea16 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
>>>>> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce,
>>>>>     #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
>>>>>     
>>>>>     static void
>>>>> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl,
>>>>> -                                   struct intel_engine_cs *engine)
>>>>> +__queue_and_release_pm(struct i915_request *rq,
>>>>> +                    struct intel_timeline *tl,
>>>>> +                    struct intel_engine_cs *engine)
>>>>>     {
>>>>>         struct intel_gt_timelines *timelines = &engine->gt->timelines;
>>>>>     
>>>>> +     /*
>>>>> +      * We have to serialise all potential retirement paths with our
>>>>> +      * submission, as we don't want to underflow either the
>>>>> +      * engine->wakeref.counter or our timeline->active_count.
>>>>> +      *
>>>>> +      * Equally, we cannot allow a new submission to start until
>>>>> +      * after we finish queueing, nor could we allow that submitter
>>>>> +      * to retire us before we are ready!
>>>>> +      */
>>>>>         spin_lock(&timelines->lock);
>>>>>     
>>>>> -     if (!atomic_fetch_inc(&tl->active_count))
>>>>> -             list_add_tail(&tl->link, &timelines->active_list);
>>>>> +     /* Hand the request over to HW and so engine_retire() */
>>>>> +     __i915_request_queue(rq, NULL);
>>>>>     
>>>>> +     /* Let new submissions commence (and maybe retire this timeline) */
>>>>>         __intel_wakeref_defer_park(&engine->wakeref);
>>>>>     
>>>>> +     /* Let intel_gt_retire_requests() retire us */
>>>>> +     if (!atomic_fetch_inc(&tl->active_count))
>>>>> +             list_add_tail(&tl->link, &timelines->active_list);
>>>>> +
>>>>>         spin_unlock(&timelines->lock);
>>>>
>>>> Now that everything is under the lock the order of operation is not
>>>> important, or it still is?
>>>
>>> queue before unpark that is required.
>>>
>>> unpark and add_to_timeline, the order is flexible as the lock governors
>>> the interactions between those and retirers. So I chose to allow the
>>> next newcomer start a few instructions earlier.
>>
>> Yes, because of different locks. So the comment above
>> __intel_wakeref_defer_park is not correct since timeline cannot be
>> retired until the lock is dropped.
> 
> The goal was to indicate that the wakeref.count will allow new
> submissions to bypass the engine-pm; while also tying back to the
> retirement theme and reminding the reader that request submission also
> implies some retiring of old requests on the timeline.
> 
> So I was trying to point out the connection between all steps and the
> act of retiring, since that was most pressing on my mind.
> 
>> It's only preservation of timeline ordering which mandates defer_park
>> after request_queue. As far as I am able to summon my own understanding
>> from yesterday.
> 
> Correct. That's the important bit from yesterday.

Phew.. thanks for re-confirming.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 373a4b9f159c..bd0af02bea16 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -74,18 +74,33 @@  static inline void __timeline_mark_unlock(struct intel_context *ce,
 #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */
 
 static void
-__intel_timeline_enter_and_release_pm(struct intel_timeline *tl,
-				      struct intel_engine_cs *engine)
+__queue_and_release_pm(struct i915_request *rq,
+		       struct intel_timeline *tl,
+		       struct intel_engine_cs *engine)
 {
 	struct intel_gt_timelines *timelines = &engine->gt->timelines;
 
+	/*
+	 * We have to serialise all potential retirement paths with our
+	 * submission, as we don't want to underflow either the
+	 * engine->wakeref.counter or our timeline->active_count.
+	 *
+	 * Equally, we cannot allow a new submission to start until
+	 * after we finish queueing, nor could we allow that submitter
+	 * to retire us before we are ready!
+	 */
 	spin_lock(&timelines->lock);
 
-	if (!atomic_fetch_inc(&tl->active_count))
-		list_add_tail(&tl->link, &timelines->active_list);
+	/* Hand the request over to HW and so engine_retire() */
+	__i915_request_queue(rq, NULL);
 
+	/* Let new submissions commence (and maybe retire this timeline) */
 	__intel_wakeref_defer_park(&engine->wakeref);
 
+	/* Let intel_gt_retire_requests() retire us */
+	if (!atomic_fetch_inc(&tl->active_count))
+		list_add_tail(&tl->link, &timelines->active_list);
+
 	spin_unlock(&timelines->lock);
 }
 
@@ -148,10 +163,8 @@  static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
 	__i915_request_commit(rq);
 
-	__i915_request_queue(rq, NULL);
-
-	/* Expose ourselves to intel_gt_retire_requests() and new submission */
-	__intel_timeline_enter_and_release_pm(ce->timeline, engine);
+	/* Expose ourselves to the world */
+	__queue_and_release_pm(rq, ce->timeline, engine);
 
 	result = false;
 out_unlock: