Message ID | 20190917074312.12290-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Lock signaler timeline while navigating | expand |
On 17/09/2019 08:43, Chris Wilson wrote: > As we need to take a walk back along the signaler timeline to find the > fence before upon which we want to wait, we need to lock that timeline > to prevent it being modified as we walk. Similarly, we also need to > acquire a reference to the earlier fence while it still exists! > > Though we lack the correct locking today, we are saved by the > overarching struct_mutex -- but that protection is being removed. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------ > 1 file changed, 24 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index f12358150097..452ad7a8ff0c 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce) > static int > i915_request_await_start(struct i915_request *rq, struct i915_request *signal) > { > - if (list_is_first(&signal->link, &signal->timeline->requests)) > + struct intel_timeline *tl = signal->timeline; > + struct dma_fence *fence; > + int err; > + > + if (list_is_first(&signal->link, &tl->requests)) > return 0; > > - signal = list_prev_entry(signal, link); > - if (intel_timeline_sync_is_later(rq->timeline, &signal->fence)) > + if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING)) We are getting more and more these nested ones and it will become unmanageable to remember which ones, why and on what paths. Perhaps we need a comment next to the member in the structure definition? > + return -EINTR; > + > + if (list_is_first(&signal->link, &tl->requests)) { > + fence = NULL; > + } else { > + signal = list_prev_entry(signal, link); > + fence = dma_fence_get_rcu(&signal->fence); > + } > + mutex_unlock(&tl->mutex); > + if (!fence) Can it be no fence when it was obtained under the mutex? > return 0; > > - return i915_sw_fence_await_dma_fence(&rq->submit, > - &signal->fence, 0, > - I915_FENCE_GFP); > + err = 0; > + if (!intel_timeline_sync_is_later(rq->timeline, fence)) > + err = i915_sw_fence_await_dma_fence(&rq->submit, > + fence, 0, > + I915_FENCE_GFP); > + dma_fence_put(fence); > + > + return err; > } > > static intel_engine_mask_t > Regards, Tvrtko
Quoting Tvrtko Ursulin (2019-09-17 15:51:45) > > On 17/09/2019 08:43, Chris Wilson wrote: > > As we need to take a walk back along the signaler timeline to find the > > fence before upon which we want to wait, we need to lock that timeline > > to prevent it being modified as we walk. Similarly, we also need to > > acquire a reference to the earlier fence while it still exists! > > > > Though we lack the correct locking today, we are saved by the > > overarching struct_mutex -- but that protection is being removed. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------ > > 1 file changed, 24 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index f12358150097..452ad7a8ff0c 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce) > > static int > > i915_request_await_start(struct i915_request *rq, struct i915_request *signal) > > { > > - if (list_is_first(&signal->link, &signal->timeline->requests)) > > + struct intel_timeline *tl = signal->timeline; > > + struct dma_fence *fence; > > + int err; > > + > > + if (list_is_first(&signal->link, &tl->requests)) > > return 0; > > > > - signal = list_prev_entry(signal, link); > > - if (intel_timeline_sync_is_later(rq->timeline, &signal->fence)) > > + if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING)) > > We are getting more and more these nested ones and it will become > unmanageable to remember which ones, why and on what paths. Perhaps we > need a comment next to the member in the structure definition? The timeline mutex is held for request construction and retire; entry and exit of the timeline->requests. Since we have a request, it should be holding its rq->timeline->mutex; is that what you wish documented? Similarly that signal->timeline != rq->timeline. GEM_BUG_ON(signal->timeline == rq->timeline); lockdep_assert_held(&rq->timeline->mutex); > > + return -EINTR; > > + > > + if (list_is_first(&signal->link, &tl->requests)) { > > + fence = NULL; > > + } else { > > + signal = list_prev_entry(signal, link); > > + fence = dma_fence_get_rcu(&signal->fence); > > + } > > + mutex_unlock(&tl->mutex); > > + if (!fence) > > Can it be no fence when it was obtained under the mutex? Yes. The previous fence may have been retired and so removed from the tl->requests before we acquire the mutex. It should not be freed while it is still in the list, i.e. dma_fence_get_rcu() should never return NULL. -Chris
On 17/09/2019 16:04, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2019-09-17 15:51:45) >> >> On 17/09/2019 08:43, Chris Wilson wrote: >>> As we need to take a walk back along the signaler timeline to find the >>> fence before upon which we want to wait, we need to lock that timeline >>> to prevent it being modified as we walk. Similarly, we also need to >>> acquire a reference to the earlier fence while it still exists! >>> >>> Though we lack the correct locking today, we are saved by the >>> overarching struct_mutex -- but that protection is being removed. >>> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >>> --- >>> drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------ >>> 1 file changed, 24 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >>> index f12358150097..452ad7a8ff0c 100644 >>> --- a/drivers/gpu/drm/i915/i915_request.c >>> +++ b/drivers/gpu/drm/i915/i915_request.c >>> @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce) >>> static int >>> i915_request_await_start(struct i915_request *rq, struct i915_request *signal) >>> { >>> - if (list_is_first(&signal->link, &signal->timeline->requests)) >>> + struct intel_timeline *tl = signal->timeline; >>> + struct dma_fence *fence; >>> + int err; >>> + >>> + if (list_is_first(&signal->link, &tl->requests)) >>> return 0; >>> >>> - signal = list_prev_entry(signal, link); >>> - if (intel_timeline_sync_is_later(rq->timeline, &signal->fence)) >>> + if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING)) >> >> We are getting more and more these nested ones and it will become >> unmanageable to remember which ones, why and on what paths. Perhaps we >> need a comment next to the member in the structure definition? > > The timeline mutex is held for request construction and retire; entry > and exit of the timeline->requests. Since we have a request, it should > be holding its rq->timeline->mutex; is that what you wish documented? > > Similarly that signal->timeline != rq->timeline. > > GEM_BUG_ON(signal->timeline == rq->timeline); > lockdep_assert_held(&rq->timeline->mutex); Yeah that helps. >>> + return -EINTR; >>> + >>> + if (list_is_first(&signal->link, &tl->requests)) { >>> + fence = NULL; >>> + } else { >>> + signal = list_prev_entry(signal, link); >>> + fence = dma_fence_get_rcu(&signal->fence); >>> + } >>> + mutex_unlock(&tl->mutex); >>> + if (!fence) >> >> Can it be no fence when it was obtained under the mutex? > > Yes. The previous fence may have been retired and so removed from the > tl->requests before we acquire the mutex. It should not be freed while > it is still in the list, i.e. dma_fence_get_rcu() should never return > NULL. I did not worry about the list_is_first check, just about dma_fence_get_rcu. At least I'm happy that one is safe and v2 is even better in this respect. Regards, Tvrtko
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index f12358150097..452ad7a8ff0c 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce) static int i915_request_await_start(struct i915_request *rq, struct i915_request *signal) { - if (list_is_first(&signal->link, &signal->timeline->requests)) + struct intel_timeline *tl = signal->timeline; + struct dma_fence *fence; + int err; + + if (list_is_first(&signal->link, &tl->requests)) return 0; - signal = list_prev_entry(signal, link); - if (intel_timeline_sync_is_later(rq->timeline, &signal->fence)) + if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING)) + return -EINTR; + + if (list_is_first(&signal->link, &tl->requests)) { + fence = NULL; + } else { + signal = list_prev_entry(signal, link); + fence = dma_fence_get_rcu(&signal->fence); + } + mutex_unlock(&tl->mutex); + if (!fence) return 0; - return i915_sw_fence_await_dma_fence(&rq->submit, - &signal->fence, 0, - I915_FENCE_GFP); + err = 0; + if (!intel_timeline_sync_is_later(rq->timeline, fence)) + err = i915_sw_fence_await_dma_fence(&rq->submit, + fence, 0, + I915_FENCE_GFP); + dma_fence_put(fence); + + return err; } static intel_engine_mask_t
As we need to take a walk back along the signaler timeline to find the fence before upon which we want to wait, we need to lock that timeline to prevent it being modified as we walk. Similarly, we also need to acquire a reference to the earlier fence while it still exists! Though we lack the correct locking today, we are saved by the overarching struct_mutex -- but that protection is being removed. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-)