Message ID | 20180503063757.22238-5-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03/05/2018 07:36, Chris Wilson wrote: > As we reschedule the requests, we do not want the submission tasklet > running until we finish updating the priority chains. (We start > rewriting priorities from the oldest, but the dequeue looks at the most > recent in-flight, so there is a small race condition where dequeue may > decide that preemption is falsely required.) Combine the tasklet kicking > from adding a new request with the set-wedge protection so that we only > have to adjust the preempt-counter once to achieve both goals. > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > --- > drivers/gpu/drm/i915/i915_gem.c | 4 ++-- > drivers/gpu/drm/i915/i915_request.c | 5 +---- > 2 files changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 5ece6ae4bdff..03cd30001b5d 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence, > rq = to_request(fence); > engine = rq->engine; > > - rcu_read_lock(); > + local_bh_disable(); /* RCU serialisation for set-wedged protection */ > if (engine->schedule) > engine->schedule(rq, attr); > - rcu_read_unlock(); > + local_bh_enable(); /* kick the tasklets if queues were reprioritised */ > } > > static void fence_set_priority(struct dma_fence *fence, > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 75061f9e48eb..0756fafa7f81 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches) > * decide whether to preempt the entire chain so that it is ready to > * run at the earliest possible convenience. > */ > - rcu_read_lock(); > + local_bh_disable(); > if (engine->schedule) > engine->schedule(request, &request->ctx->sched); > - rcu_read_unlock(); > - > - local_bh_disable(); > i915_sw_fence_commit(&request->submit); > local_bh_enable(); /* Kick the execlists tasklet if just scheduled */ > > AFAICS this doesn't disable tasklets running on other CPUs in parallel, on different engines, so they still may see the non-atomic (wrt schedule) snapshot of the submission queues. So I am not sure what it means. It prevents a tasklet from interrupt the schedule of this request - but as I said, I am not sure of the benefit. Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-05-03 18:49:27) > > On 03/05/2018 07:36, Chris Wilson wrote: > > As we reschedule the requests, we do not want the submission tasklet > > running until we finish updating the priority chains. (We start > > rewriting priorities from the oldest, but the dequeue looks at the most > > recent in-flight, so there is a small race condition where dequeue may > > decide that preemption is falsely required.) Combine the tasklet kicking > > from adding a new request with the set-wedge protection so that we only > > have to adjust the preempt-counter once to achieve both goals. > > > > Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 4 ++-- > > drivers/gpu/drm/i915/i915_request.c | 5 +---- > > 2 files changed, 3 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 5ece6ae4bdff..03cd30001b5d 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence, > > rq = to_request(fence); > > engine = rq->engine; > > > > - rcu_read_lock(); > > + local_bh_disable(); /* RCU serialisation for set-wedged protection */ > > if (engine->schedule) > > engine->schedule(rq, attr); > > - rcu_read_unlock(); > > + local_bh_enable(); /* kick the tasklets if queues were reprioritised */ > > } > > > > static void fence_set_priority(struct dma_fence *fence, > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 75061f9e48eb..0756fafa7f81 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches) > > * decide whether to preempt the entire chain so that it is ready to > > * run at the earliest possible convenience. > > */ > > - rcu_read_lock(); > > + local_bh_disable(); > > if (engine->schedule) > > engine->schedule(request, &request->ctx->sched); > > - rcu_read_unlock(); > > - > > - local_bh_disable(); > > i915_sw_fence_commit(&request->submit); > > local_bh_enable(); /* Kick the execlists tasklet if just scheduled */ > > > > > > AFAICS this doesn't disable tasklets running on other CPUs in parallel, > on different engines, so they still may see the non-atomic (wrt > schedule) snapshot of the submission queues. So I am not sure what it > means. It prevents a tasklet from interrupt the schedule of this request > - but as I said, I am not sure of the benefit. That was my "oh bother" comment as well. We don't realise the benefit of ensuring that we always process the entire chain before a concurrent tasklet starts processing the update; but we do coalesce the double preempt-counter manipulation into one, and only **actually** kick the tasklet when rescheduling a page flip rather than be forced to wait to ksoftird. tasklets are only fast-pathed if scheduled from interrupt context! -Chris
On 03/05/2018 20:50, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-05-03 18:49:27) >> >> On 03/05/2018 07:36, Chris Wilson wrote: >>> As we reschedule the requests, we do not want the submission tasklet >>> running until we finish updating the priority chains. (We start >>> rewriting priorities from the oldest, but the dequeue looks at the most >>> recent in-flight, so there is a small race condition where dequeue may >>> decide that preemption is falsely required.) Combine the tasklet kicking >>> from adding a new request with the set-wedge protection so that we only >>> have to adjust the preempt-counter once to achieve both goals. >>> >>> Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> >>> --- >>> drivers/gpu/drm/i915/i915_gem.c | 4 ++-- >>> drivers/gpu/drm/i915/i915_request.c | 5 +---- >>> 2 files changed, 3 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >>> index 5ece6ae4bdff..03cd30001b5d 100644 >>> --- a/drivers/gpu/drm/i915/i915_gem.c >>> +++ b/drivers/gpu/drm/i915/i915_gem.c >>> @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence, >>> rq = to_request(fence); >>> engine = rq->engine; >>> >>> - rcu_read_lock(); >>> + local_bh_disable(); /* RCU serialisation for set-wedged protection */ >>> if (engine->schedule) >>> engine->schedule(rq, attr); >>> - rcu_read_unlock(); >>> + local_bh_enable(); /* kick the tasklets if queues were reprioritised */ >>> } >>> >>> static void fence_set_priority(struct dma_fence *fence, >>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >>> index 75061f9e48eb..0756fafa7f81 100644 >>> --- a/drivers/gpu/drm/i915/i915_request.c >>> +++ b/drivers/gpu/drm/i915/i915_request.c >>> @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches) >>> * decide whether to preempt the entire chain so that it is ready to >>> * run at the earliest possible convenience. >>> */ >>> - rcu_read_lock(); >>> + local_bh_disable(); >>> if (engine->schedule) >>> engine->schedule(request, &request->ctx->sched); >>> - rcu_read_unlock(); >>> - >>> - local_bh_disable(); >>> i915_sw_fence_commit(&request->submit); >>> local_bh_enable(); /* Kick the execlists tasklet if just scheduled */ >>> >>> >> >> AFAICS this doesn't disable tasklets running on other CPUs in parallel, >> on different engines, so they still may see the non-atomic (wrt >> schedule) snapshot of the submission queues. So I am not sure what it >> means. It prevents a tasklet from interrupt the schedule of this request >> - but as I said, I am not sure of the benefit. > > That was my "oh bother" comment as well. We don't realise the benefit of > ensuring that we always process the entire chain before a concurrent > tasklet starts processing the update; but we do coalesce the double > preempt-counter manipulation into one, and only **actually** kick the > tasklet when rescheduling a page flip rather than be forced to wait to > ksoftird. tasklets are only fast-pathed if scheduled from interrupt > context! As far as I understand it, the hunk in __fence_set_priority may help triggering preemption quicker, rather than on the next csb or submit activity - so makes some sense. I say some because it would be a balancing question between the time taken to re-schedule vs alternative to just kick the tasklet after reschedule is done outside the softirq disable section. The second hunk is a bit more difficult. It creates a longer softirq-off section, which is a slight negative, and I am unsure how much it actually closes the race with tasklets in practice. So it may be the only benefit is to reduce fiddles of both preempt count and local_bh_disable, to fiddle just one. Can I ask for a patch split? :) Regards, Tvrtko
Quoting Tvrtko Ursulin (2018-05-04 10:15:20) > On 03/05/2018 20:50, Chris Wilson wrote: > The second hunk is a bit more difficult. It creates a longer softirq-off > section, which is a slight negative, and I am unsure how much it > actually closes the race with tasklets in practice. So it may be the > only benefit is to reduce fiddles of both preempt count and > local_bh_disable, to fiddle just one. Remember that local_bh_disable() is just a fiddle with preempt-count and that the local tasklet doesn't run during the !preemptible section anyway. So it does remove one preemption point (in clearing the preempt-count) but that is of dubious merit since we would then need to kick the submission tasklet again immediately (from the preempt-count pov) afterwards. -Chris
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 5ece6ae4bdff..03cd30001b5d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -578,10 +578,10 @@ static void __fence_set_priority(struct dma_fence *fence, rq = to_request(fence); engine = rq->engine; - rcu_read_lock(); + local_bh_disable(); /* RCU serialisation for set-wedged protection */ if (engine->schedule) engine->schedule(rq, attr); - rcu_read_unlock(); + local_bh_enable(); /* kick the tasklets if queues were reprioritised */ } static void fence_set_priority(struct dma_fence *fence, diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 75061f9e48eb..0756fafa7f81 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1109,12 +1109,9 @@ void __i915_request_add(struct i915_request *request, bool flush_caches) * decide whether to preempt the entire chain so that it is ready to * run at the earliest possible convenience. */ - rcu_read_lock(); + local_bh_disable(); if (engine->schedule) engine->schedule(request, &request->ctx->sched); - rcu_read_unlock(); - - local_bh_disable(); i915_sw_fence_commit(&request->submit); local_bh_enable(); /* Kick the execlists tasklet if just scheduled */
As we reschedule the requests, we do not want the submission tasklet running until we finish updating the priority chains. (We start rewriting priorities from the oldest, but the dequeue looks at the most recent in-flight, so there is a small race condition where dequeue may decide that preemption is falsely required.) Combine the tasklet kicking from adding a new request with the set-wedge protection so that we only have to adjust the preempt-counter once to achieve both goals. Reported-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> --- drivers/gpu/drm/i915/i915_gem.c | 4 ++-- drivers/gpu/drm/i915/i915_request.c | 5 +---- 2 files changed, 3 insertions(+), 6 deletions(-)