diff mbox series

drm/i915: Lock signaler timeline while navigating

Message ID 20190917074312.12290-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915: Lock signaler timeline while navigating | expand

Commit Message

Chris Wilson Sept. 17, 2019, 7:43 a.m. UTC
As we need to take a walk back along the signaler timeline to find the
fence before upon which we want to wait, we need to lock that timeline
to prevent it being modified as we walk. Similarly, we also need to
acquire a reference to the earlier fence while it still exists!

Though we lack the correct locking today, we are saved by the
overarching struct_mutex -- but that protection is being removed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

Comments

Tvrtko Ursulin Sept. 17, 2019, 2:51 p.m. UTC | #1
On 17/09/2019 08:43, Chris Wilson wrote:
> As we need to take a walk back along the signaler timeline to find the
> fence before upon which we want to wait, we need to lock that timeline
> to prevent it being modified as we walk. Similarly, we also need to
> acquire a reference to the earlier fence while it still exists!
> 
> Though we lack the correct locking today, we are saved by the
> overarching struct_mutex -- but that protection is being removed.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------
>   1 file changed, 24 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index f12358150097..452ad7a8ff0c 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce)
>   static int
>   i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
>   {
> -	if (list_is_first(&signal->link, &signal->timeline->requests))
> +	struct intel_timeline *tl = signal->timeline;
> +	struct dma_fence *fence;
> +	int err;
> +
> +	if (list_is_first(&signal->link, &tl->requests))
>   		return 0;
>   
> -	signal = list_prev_entry(signal, link);
> -	if (intel_timeline_sync_is_later(rq->timeline, &signal->fence))
> +	if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING))

We are getting more and more these nested ones and it will become 
unmanageable to remember which ones, why and on what paths. Perhaps we 
need a comment next to the member in the structure definition?

> +		return -EINTR;
> +
> +	if (list_is_first(&signal->link, &tl->requests)) {
> +		fence = NULL;
> +	} else {
> +		signal = list_prev_entry(signal, link);
> +		fence = dma_fence_get_rcu(&signal->fence);
> +	}
> +	mutex_unlock(&tl->mutex);
> +	if (!fence)

Can it be no fence when it was obtained under the mutex?

>   		return 0;
>   
> -	return i915_sw_fence_await_dma_fence(&rq->submit,
> -					     &signal->fence, 0,
> -					     I915_FENCE_GFP);
> +	err = 0;
> +	if (!intel_timeline_sync_is_later(rq->timeline, fence))
> +		err = i915_sw_fence_await_dma_fence(&rq->submit,
> +						    fence, 0,
> +						    I915_FENCE_GFP);
> +	dma_fence_put(fence);
> +
> +	return err;
>   }
>   
>   static intel_engine_mask_t
> 

Regards,

Tvrtko
Chris Wilson Sept. 17, 2019, 3:04 p.m. UTC | #2
Quoting Tvrtko Ursulin (2019-09-17 15:51:45)
> 
> On 17/09/2019 08:43, Chris Wilson wrote:
> > As we need to take a walk back along the signaler timeline to find the
> > fence before upon which we want to wait, we need to lock that timeline
> > to prevent it being modified as we walk. Similarly, we also need to
> > acquire a reference to the earlier fence while it still exists!
> > 
> > Though we lack the correct locking today, we are saved by the
> > overarching struct_mutex -- but that protection is being removed.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------
> >   1 file changed, 24 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index f12358150097..452ad7a8ff0c 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce)
> >   static int
> >   i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
> >   {
> > -     if (list_is_first(&signal->link, &signal->timeline->requests))
> > +     struct intel_timeline *tl = signal->timeline;
> > +     struct dma_fence *fence;
> > +     int err;
> > +
> > +     if (list_is_first(&signal->link, &tl->requests))
> >               return 0;
> >   
> > -     signal = list_prev_entry(signal, link);
> > -     if (intel_timeline_sync_is_later(rq->timeline, &signal->fence))
> > +     if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING))
> 
> We are getting more and more these nested ones and it will become 
> unmanageable to remember which ones, why and on what paths. Perhaps we 
> need a comment next to the member in the structure definition?

The timeline mutex is held for request construction and retire; entry
and exit of the timeline->requests. Since we have a request, it should
be holding its rq->timeline->mutex; is that what you wish documented?

Similarly that signal->timeline != rq->timeline.

GEM_BUG_ON(signal->timeline == rq->timeline);
lockdep_assert_held(&rq->timeline->mutex);

> > +             return -EINTR;
> > +
> > +     if (list_is_first(&signal->link, &tl->requests)) {
> > +             fence = NULL;
> > +     } else {
> > +             signal = list_prev_entry(signal, link);
> > +             fence = dma_fence_get_rcu(&signal->fence);
> > +     }
> > +     mutex_unlock(&tl->mutex);
> > +     if (!fence)
> 
> Can it be no fence when it was obtained under the mutex?

Yes. The previous fence may have been retired and so removed from the
tl->requests before we acquire the mutex. It should not be freed while
it is still in the list, i.e. dma_fence_get_rcu() should never return
NULL.
-Chris
Tvrtko Ursulin Sept. 17, 2019, 3:22 p.m. UTC | #3
On 17/09/2019 16:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-09-17 15:51:45)
>>
>> On 17/09/2019 08:43, Chris Wilson wrote:
>>> As we need to take a walk back along the signaler timeline to find the
>>> fence before upon which we want to wait, we need to lock that timeline
>>> to prevent it being modified as we walk. Similarly, we also need to
>>> acquire a reference to the earlier fence while it still exists!
>>>
>>> Though we lack the correct locking today, we are saved by the
>>> overarching struct_mutex -- but that protection is being removed.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_request.c | 30 +++++++++++++++++++++++------
>>>    1 file changed, 24 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>>> index f12358150097..452ad7a8ff0c 100644
>>> --- a/drivers/gpu/drm/i915/i915_request.c
>>> +++ b/drivers/gpu/drm/i915/i915_request.c
>>> @@ -767,16 +767,34 @@ i915_request_create(struct intel_context *ce)
>>>    static int
>>>    i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
>>>    {
>>> -     if (list_is_first(&signal->link, &signal->timeline->requests))
>>> +     struct intel_timeline *tl = signal->timeline;
>>> +     struct dma_fence *fence;
>>> +     int err;
>>> +
>>> +     if (list_is_first(&signal->link, &tl->requests))
>>>                return 0;
>>>    
>>> -     signal = list_prev_entry(signal, link);
>>> -     if (intel_timeline_sync_is_later(rq->timeline, &signal->fence))
>>> +     if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING))
>>
>> We are getting more and more these nested ones and it will become
>> unmanageable to remember which ones, why and on what paths. Perhaps we
>> need a comment next to the member in the structure definition?
> 
> The timeline mutex is held for request construction and retire; entry
> and exit of the timeline->requests. Since we have a request, it should
> be holding its rq->timeline->mutex; is that what you wish documented?
> 
> Similarly that signal->timeline != rq->timeline.
> 
> GEM_BUG_ON(signal->timeline == rq->timeline);
> lockdep_assert_held(&rq->timeline->mutex);

Yeah that helps.

>>> +             return -EINTR;
>>> +
>>> +     if (list_is_first(&signal->link, &tl->requests)) {
>>> +             fence = NULL;
>>> +     } else {
>>> +             signal = list_prev_entry(signal, link);
>>> +             fence = dma_fence_get_rcu(&signal->fence);
>>> +     }
>>> +     mutex_unlock(&tl->mutex);
>>> +     if (!fence)
>>
>> Can it be no fence when it was obtained under the mutex?
> 
> Yes. The previous fence may have been retired and so removed from the
> tl->requests before we acquire the mutex. It should not be freed while
> it is still in the list, i.e. dma_fence_get_rcu() should never return
> NULL.

I did not worry about the list_is_first check, just about 
dma_fence_get_rcu. At least I'm happy that one is safe and v2 is even 
better in this respect.

Regards,

Tvrtko
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f12358150097..452ad7a8ff0c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -767,16 +767,34 @@  i915_request_create(struct intel_context *ce)
 static int
 i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
 {
-	if (list_is_first(&signal->link, &signal->timeline->requests))
+	struct intel_timeline *tl = signal->timeline;
+	struct dma_fence *fence;
+	int err;
+
+	if (list_is_first(&signal->link, &tl->requests))
 		return 0;
 
-	signal = list_prev_entry(signal, link);
-	if (intel_timeline_sync_is_later(rq->timeline, &signal->fence))
+	if (mutex_lock_interruptible_nested(&tl->mutex, SINGLE_DEPTH_NESTING))
+		return -EINTR;
+
+	if (list_is_first(&signal->link, &tl->requests)) {
+		fence = NULL;
+	} else {
+		signal = list_prev_entry(signal, link);
+		fence = dma_fence_get_rcu(&signal->fence);
+	}
+	mutex_unlock(&tl->mutex);
+	if (!fence)
 		return 0;
 
-	return i915_sw_fence_await_dma_fence(&rq->submit,
-					     &signal->fence, 0,
-					     I915_FENCE_GFP);
+	err = 0;
+	if (!intel_timeline_sync_is_later(rq->timeline, fence))
+		err = i915_sw_fence_await_dma_fence(&rq->submit,
+						    fence, 0,
+						    I915_FENCE_GFP);
+	dma_fence_put(fence);
+
+	return err;
 }
 
 static intel_engine_mask_t