Message ID | 20191121135131.338984-4-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/5] drm/i915: Use a ctor for TYPESAFE_BY_RCU i915_request | expand |
On 21/11/2019 13:51, Chris Wilson wrote: > In the next patch, we will introduce a new asynchronous retirement > worker, fed by execlists CS events. Here we may queue a retirement as > soon as a request is submitted to HW (and completes instantly), and we > also want to process that retirement as early as possible and cannot > afford to postpone (as there may not be another opportunity to retire it > for a few seconds). To allow the new async retirer to run in parallel > with our submission, pull the __i915_request_queue (that passes the > request to HW) inside the timelines spinlock so that the retirement > cannot release the timeline before we have completed the submission. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++------- > 1 file changed, 21 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > index 373a4b9f159c..bd0af02bea16 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce, > #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */ > > static void > -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl, > - struct intel_engine_cs *engine) > +__queue_and_release_pm(struct i915_request *rq, > + struct intel_timeline *tl, > + struct intel_engine_cs *engine) > { > struct intel_gt_timelines *timelines = &engine->gt->timelines; > > + /* > + * We have to serialise all potential retirement paths with our > + * submission, as we don't want to underflow either the > + * engine->wakeref.counter or our timeline->active_count. > + * > + * Equally, we cannot allow a new submission to start until > + * after we finish queueing, nor could we allow that submitter > + * to retire us before we are ready! > + */ > spin_lock(&timelines->lock); > > - if (!atomic_fetch_inc(&tl->active_count)) > - list_add_tail(&tl->link, &timelines->active_list); > + /* Hand the request over to HW and so engine_retire() */ > + __i915_request_queue(rq, NULL); > > + /* Let new submissions commence (and maybe retire this timeline) */ > __intel_wakeref_defer_park(&engine->wakeref); > > + /* Let intel_gt_retire_requests() retire us */ > + if (!atomic_fetch_inc(&tl->active_count)) > + list_add_tail(&tl->link, &timelines->active_list); > + > spin_unlock(&timelines->lock); Now that everything is under the lock the order of operation is not important, or it still is? Regards, Tvrtko > } > > @@ -148,10 +163,8 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine) > rq->sched.attr.priority = I915_PRIORITY_BARRIER; > __i915_request_commit(rq); > > - __i915_request_queue(rq, NULL); > - > - /* Expose ourselves to intel_gt_retire_requests() and new submission */ > - __intel_timeline_enter_and_release_pm(ce->timeline, engine); > + /* Expose ourselves to the world */ > + __queue_and_release_pm(rq, ce->timeline, engine); > > result = false; > out_unlock: >
Quoting Tvrtko Ursulin (2019-11-21 14:42:56) > > On 21/11/2019 13:51, Chris Wilson wrote: > > In the next patch, we will introduce a new asynchronous retirement > > worker, fed by execlists CS events. Here we may queue a retirement as > > soon as a request is submitted to HW (and completes instantly), and we > > also want to process that retirement as early as possible and cannot > > afford to postpone (as there may not be another opportunity to retire it > > for a few seconds). To allow the new async retirer to run in parallel > > with our submission, pull the __i915_request_queue (that passes the > > request to HW) inside the timelines spinlock so that the retirement > > cannot release the timeline before we have completed the submission. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++------- > > 1 file changed, 21 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > > index 373a4b9f159c..bd0af02bea16 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > > @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce, > > #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */ > > > > static void > > -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl, > > - struct intel_engine_cs *engine) > > +__queue_and_release_pm(struct i915_request *rq, > > + struct intel_timeline *tl, > > + struct intel_engine_cs *engine) > > { > > struct intel_gt_timelines *timelines = &engine->gt->timelines; > > > > + /* > > + * We have to serialise all potential retirement paths with our > > + * submission, as we don't want to underflow either the > > + * engine->wakeref.counter or our timeline->active_count. > > + * > > + * Equally, we cannot allow a new submission to start until > > + * after we finish queueing, nor could we allow that submitter > > + * to retire us before we are ready! > > + */ > > spin_lock(&timelines->lock); > > > > - if (!atomic_fetch_inc(&tl->active_count)) > > - list_add_tail(&tl->link, &timelines->active_list); > > + /* Hand the request over to HW and so engine_retire() */ > > + __i915_request_queue(rq, NULL); > > > > + /* Let new submissions commence (and maybe retire this timeline) */ > > __intel_wakeref_defer_park(&engine->wakeref); > > > > + /* Let intel_gt_retire_requests() retire us */ > > + if (!atomic_fetch_inc(&tl->active_count)) > > + list_add_tail(&tl->link, &timelines->active_list); > > + > > spin_unlock(&timelines->lock); > > Now that everything is under the lock the order of operation is not > important, or it still is? queue before unpark that is required. unpark and add_to_timeline, the order is flexible as the lock governors the interactions between those and retirers. So I chose to allow the next newcomer start a few instructions earlier. -Chris
On 21/11/2019 14:53, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2019-11-21 14:42:56) >> >> On 21/11/2019 13:51, Chris Wilson wrote: >>> In the next patch, we will introduce a new asynchronous retirement >>> worker, fed by execlists CS events. Here we may queue a retirement as >>> soon as a request is submitted to HW (and completes instantly), and we >>> also want to process that retirement as early as possible and cannot >>> afford to postpone (as there may not be another opportunity to retire it >>> for a few seconds). To allow the new async retirer to run in parallel >>> with our submission, pull the __i915_request_queue (that passes the >>> request to HW) inside the timelines spinlock so that the retirement >>> cannot release the timeline before we have completed the submission. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> --- >>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++------- >>> 1 file changed, 21 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> index 373a4b9f159c..bd0af02bea16 100644 >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce, >>> #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */ >>> >>> static void >>> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl, >>> - struct intel_engine_cs *engine) >>> +__queue_and_release_pm(struct i915_request *rq, >>> + struct intel_timeline *tl, >>> + struct intel_engine_cs *engine) >>> { >>> struct intel_gt_timelines *timelines = &engine->gt->timelines; >>> >>> + /* >>> + * We have to serialise all potential retirement paths with our >>> + * submission, as we don't want to underflow either the >>> + * engine->wakeref.counter or our timeline->active_count. >>> + * >>> + * Equally, we cannot allow a new submission to start until >>> + * after we finish queueing, nor could we allow that submitter >>> + * to retire us before we are ready! >>> + */ >>> spin_lock(&timelines->lock); >>> >>> - if (!atomic_fetch_inc(&tl->active_count)) >>> - list_add_tail(&tl->link, &timelines->active_list); >>> + /* Hand the request over to HW and so engine_retire() */ >>> + __i915_request_queue(rq, NULL); >>> >>> + /* Let new submissions commence (and maybe retire this timeline) */ >>> __intel_wakeref_defer_park(&engine->wakeref); >>> >>> + /* Let intel_gt_retire_requests() retire us */ >>> + if (!atomic_fetch_inc(&tl->active_count)) >>> + list_add_tail(&tl->link, &timelines->active_list); >>> + >>> spin_unlock(&timelines->lock); >> >> Now that everything is under the lock the order of operation is not >> important, or it still is? > > queue before unpark that is required. > > unpark and add_to_timeline, the order is flexible as the lock governors > the interactions between those and retirers. So I chose to allow the > next newcomer start a few instructions earlier. Yes, because of different locks. So the comment above __intel_wakeref_defer_park is not correct since timeline cannot be retired until the lock is dropped. It's only preservation of timeline ordering which mandates defer_park after request_queue. As far as I am able to summon my own understanding from yesterday. Regards, Tvrtko
Quoting Tvrtko Ursulin (2019-11-21 16:17:59) > > On 21/11/2019 14:53, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2019-11-21 14:42:56) > >> > >> On 21/11/2019 13:51, Chris Wilson wrote: > >>> In the next patch, we will introduce a new asynchronous retirement > >>> worker, fed by execlists CS events. Here we may queue a retirement as > >>> soon as a request is submitted to HW (and completes instantly), and we > >>> also want to process that retirement as early as possible and cannot > >>> afford to postpone (as there may not be another opportunity to retire it > >>> for a few seconds). To allow the new async retirer to run in parallel > >>> with our submission, pull the __i915_request_queue (that passes the > >>> request to HW) inside the timelines spinlock so that the retirement > >>> cannot release the timeline before we have completed the submission. > >>> > >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > >>> --- > >>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++------- > >>> 1 file changed, 21 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > >>> index 373a4b9f159c..bd0af02bea16 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c > >>> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce, > >>> #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */ > >>> > >>> static void > >>> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl, > >>> - struct intel_engine_cs *engine) > >>> +__queue_and_release_pm(struct i915_request *rq, > >>> + struct intel_timeline *tl, > >>> + struct intel_engine_cs *engine) > >>> { > >>> struct intel_gt_timelines *timelines = &engine->gt->timelines; > >>> > >>> + /* > >>> + * We have to serialise all potential retirement paths with our > >>> + * submission, as we don't want to underflow either the > >>> + * engine->wakeref.counter or our timeline->active_count. > >>> + * > >>> + * Equally, we cannot allow a new submission to start until > >>> + * after we finish queueing, nor could we allow that submitter > >>> + * to retire us before we are ready! > >>> + */ > >>> spin_lock(&timelines->lock); > >>> > >>> - if (!atomic_fetch_inc(&tl->active_count)) > >>> - list_add_tail(&tl->link, &timelines->active_list); > >>> + /* Hand the request over to HW and so engine_retire() */ > >>> + __i915_request_queue(rq, NULL); > >>> > >>> + /* Let new submissions commence (and maybe retire this timeline) */ > >>> __intel_wakeref_defer_park(&engine->wakeref); > >>> > >>> + /* Let intel_gt_retire_requests() retire us */ > >>> + if (!atomic_fetch_inc(&tl->active_count)) > >>> + list_add_tail(&tl->link, &timelines->active_list); > >>> + > >>> spin_unlock(&timelines->lock); > >> > >> Now that everything is under the lock the order of operation is not > >> important, or it still is? > > > > queue before unpark that is required. > > > > unpark and add_to_timeline, the order is flexible as the lock governors > > the interactions between those and retirers. So I chose to allow the > > next newcomer start a few instructions earlier. > > Yes, because of different locks. So the comment above > __intel_wakeref_defer_park is not correct since timeline cannot be > retired until the lock is dropped. The goal was to indicate that the wakeref.count will allow new submissions to bypass the engine-pm; while also tying back to the retirement theme and reminding the reader that request submission also implies some retiring of old requests on the timeline. So I was trying to point out the connection between all steps and the act of retiring, since that was most pressing on my mind. > It's only preservation of timeline ordering which mandates defer_park > after request_queue. As far as I am able to summon my own understanding > from yesterday. Correct. That's the important bit from yesterday. -Chris
On 21/11/2019 16:24, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2019-11-21 16:17:59) >> >> On 21/11/2019 14:53, Chris Wilson wrote: >>> Quoting Tvrtko Ursulin (2019-11-21 14:42:56) >>>> >>>> On 21/11/2019 13:51, Chris Wilson wrote: >>>>> In the next patch, we will introduce a new asynchronous retirement >>>>> worker, fed by execlists CS events. Here we may queue a retirement as >>>>> soon as a request is submitted to HW (and completes instantly), and we >>>>> also want to process that retirement as early as possible and cannot >>>>> afford to postpone (as there may not be another opportunity to retire it >>>>> for a few seconds). To allow the new async retirer to run in parallel >>>>> with our submission, pull the __i915_request_queue (that passes the >>>>> request to HW) inside the timelines spinlock so that the retirement >>>>> cannot release the timeline before we have completed the submission. >>>>> >>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>>>> --- >>>>> drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++------- >>>>> 1 file changed, 21 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> index 373a4b9f159c..bd0af02bea16 100644 >>>>> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c >>>>> @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce, >>>>> #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */ >>>>> >>>>> static void >>>>> -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl, >>>>> - struct intel_engine_cs *engine) >>>>> +__queue_and_release_pm(struct i915_request *rq, >>>>> + struct intel_timeline *tl, >>>>> + struct intel_engine_cs *engine) >>>>> { >>>>> struct intel_gt_timelines *timelines = &engine->gt->timelines; >>>>> >>>>> + /* >>>>> + * We have to serialise all potential retirement paths with our >>>>> + * submission, as we don't want to underflow either the >>>>> + * engine->wakeref.counter or our timeline->active_count. >>>>> + * >>>>> + * Equally, we cannot allow a new submission to start until >>>>> + * after we finish queueing, nor could we allow that submitter >>>>> + * to retire us before we are ready! >>>>> + */ >>>>> spin_lock(&timelines->lock); >>>>> >>>>> - if (!atomic_fetch_inc(&tl->active_count)) >>>>> - list_add_tail(&tl->link, &timelines->active_list); >>>>> + /* Hand the request over to HW and so engine_retire() */ >>>>> + __i915_request_queue(rq, NULL); >>>>> >>>>> + /* Let new submissions commence (and maybe retire this timeline) */ >>>>> __intel_wakeref_defer_park(&engine->wakeref); >>>>> >>>>> + /* Let intel_gt_retire_requests() retire us */ >>>>> + if (!atomic_fetch_inc(&tl->active_count)) >>>>> + list_add_tail(&tl->link, &timelines->active_list); >>>>> + >>>>> spin_unlock(&timelines->lock); >>>> >>>> Now that everything is under the lock the order of operation is not >>>> important, or it still is? >>> >>> queue before unpark that is required. >>> >>> unpark and add_to_timeline, the order is flexible as the lock governors >>> the interactions between those and retirers. So I chose to allow the >>> next newcomer start a few instructions earlier. >> >> Yes, because of different locks. So the comment above >> __intel_wakeref_defer_park is not correct since timeline cannot be >> retired until the lock is dropped. > > The goal was to indicate that the wakeref.count will allow new > submissions to bypass the engine-pm; while also tying back to the > retirement theme and reminding the reader that request submission also > implies some retiring of old requests on the timeline. > > So I was trying to point out the connection between all steps and the > act of retiring, since that was most pressing on my mind. > >> It's only preservation of timeline ordering which mandates defer_park >> after request_queue. As far as I am able to summon my own understanding >> from yesterday. > > Correct. That's the important bit from yesterday. Phew.. thanks for re-confirming. Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 373a4b9f159c..bd0af02bea16 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -74,18 +74,33 @@ static inline void __timeline_mark_unlock(struct intel_context *ce, #endif /* !IS_ENABLED(CONFIG_LOCKDEP) */ static void -__intel_timeline_enter_and_release_pm(struct intel_timeline *tl, - struct intel_engine_cs *engine) +__queue_and_release_pm(struct i915_request *rq, + struct intel_timeline *tl, + struct intel_engine_cs *engine) { struct intel_gt_timelines *timelines = &engine->gt->timelines; + /* + * We have to serialise all potential retirement paths with our + * submission, as we don't want to underflow either the + * engine->wakeref.counter or our timeline->active_count. + * + * Equally, we cannot allow a new submission to start until + * after we finish queueing, nor could we allow that submitter + * to retire us before we are ready! + */ spin_lock(&timelines->lock); - if (!atomic_fetch_inc(&tl->active_count)) - list_add_tail(&tl->link, &timelines->active_list); + /* Hand the request over to HW and so engine_retire() */ + __i915_request_queue(rq, NULL); + /* Let new submissions commence (and maybe retire this timeline) */ __intel_wakeref_defer_park(&engine->wakeref); + /* Let intel_gt_retire_requests() retire us */ + if (!atomic_fetch_inc(&tl->active_count)) + list_add_tail(&tl->link, &timelines->active_list); + spin_unlock(&timelines->lock); } @@ -148,10 +163,8 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine) rq->sched.attr.priority = I915_PRIORITY_BARRIER; __i915_request_commit(rq); - __i915_request_queue(rq, NULL); - - /* Expose ourselves to intel_gt_retire_requests() and new submission */ - __intel_timeline_enter_and_release_pm(ce->timeline, engine); + /* Expose ourselves to the world */ + __queue_and_release_pm(rq, ce->timeline, engine); result = false; out_unlock:
In the next patch, we will introduce a new asynchronous retirement worker, fed by execlists CS events. Here we may queue a retirement as soon as a request is submitted to HW (and completes instantly), and we also want to process that retirement as early as possible and cannot afford to postpone (as there may not be another opportunity to retire it for a few seconds). To allow the new async retirer to run in parallel with our submission, pull the __i915_request_queue (that passes the request to HW) inside the timelines spinlock so that the retirement cannot release the timeline before we have completed the submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 29 ++++++++++++++++------- 1 file changed, 21 insertions(+), 8 deletions(-)