diff mbox

drm/i915: Ensure associated VMAs are inactive when contexts are destroyed

Message ID 562E263B.2040307@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Tvrtko Ursulin Oct. 26, 2015, 1:10 p.m. UTC
On 26/10/15 12:10, Chris Wilson wrote:
> On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
>>
>> On 26/10/15 11:23, Chris Wilson wrote:
>>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> In the following commit:
>>>>
>>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>
>>>>          drm/i915: Clean up associated VMAs on context destruction
>>>>
>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>> at the time of owning context is getting freed, but that turned
>>>> out to be a wrong assumption.
>>>>
>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>> list, the described situation can in fact happen.
>>>
>>> The context is being unreferenced indirectly. Adding a direct reference
>>> here is even more bizarre.
>>
>> Perhaps is not the prettiest, but it sounds logical to me to ensure
>> that order of destruction of involved object hierarchy goes from the
>> bottom-up and is not interleaved.
>>
>> If you consider the active/inactive list position as part of the
>> retire process, doing it at the very place in code, and the very
>> object that looked to be destroyed out of sequence, to me sounded
>> logical.
>>
>> How would you do it, can you think of a better way?
> 
> The reference is via the request. We are handling requests, it makes
> more sense that you take the reference on the request.

Hm, so you would be happy with:

 
> I would just revert the patch, it doesn't fix the problem you tried to
> solve and just adds more.

It solves one problem, just not all of them.

Regards,

Tvrtko

Comments

Tvrtko Ursulin Nov. 3, 2015, 10:48 a.m. UTC | #1
On 26/10/15 13:10, Tvrtko Ursulin wrote:
>
> On 26/10/15 12:10, Chris Wilson wrote:
>> On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 26/10/15 11:23, Chris Wilson wrote:
>>>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> In the following commit:
>>>>>
>>>>>       commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>       Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>       Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>
>>>>>           drm/i915: Clean up associated VMAs on context destruction
>>>>>
>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>> at the time of owning context is getting freed, but that turned
>>>>> out to be a wrong assumption.
>>>>>
>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>> list, the described situation can in fact happen.
>>>>
>>>> The context is being unreferenced indirectly. Adding a direct reference
>>>> here is even more bizarre.
>>>
>>> Perhaps is not the prettiest, but it sounds logical to me to ensure
>>> that order of destruction of involved object hierarchy goes from the
>>> bottom-up and is not interleaved.
>>>
>>> If you consider the active/inactive list position as part of the
>>> retire process, doing it at the very place in code, and the very
>>> object that looked to be destroyed out of sequence, to me sounded
>>> logical.
>>>
>>> How would you do it, can you think of a better way?
>>
>> The reference is via the request. We are handling requests, it makes
>> more sense that you take the reference on the request.
>
> Hm, so you would be happy with:
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 9b2048c7077d..c238481a8090 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2373,19 +2373,26 @@ static void
>   i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>   {
>          struct i915_vma *vma;
> +       struct drm_i915_gem_request *req;
>
>          RQ_BUG_ON(obj->last_read_req[ring] == NULL);
>          RQ_BUG_ON(!(obj->active & (1 << ring)));
>
>          list_del_init(&obj->ring_list[ring]);
> +
> +       /* Ensure context cannot be destroyed with VMAs on the active list. */
> +       req = i915_gem_request_reference(obj->last_read_req[ring]);
> +
>          i915_gem_request_assign(&obj->last_read_req[ring], NULL);
>
>          if (obj->last_write_req && obj->last_write_req->ring->id == ring)
>                  i915_gem_object_retire__write(obj);
>
>          obj->active &= ~(1 << ring);
> -       if (obj->active)
> +       if (obj->active) {
> +               i915_gem_request_unreference(req);
>                  return;
> +       }
>
>          /* Bump our place on the bound list to keep it roughly in LRU order
>           * so that we don't steal from recently used but inactive objects
> @@ -2399,6 +2406,8 @@ i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
>                          list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
>          }
>
> +       i915_gem_request_unreference(req);
> +
>          i915_gem_request_assign(&obj->last_fenced_req, NULL);
>          drm_gem_object_unreference(&obj->base);
>   }
>


Ping on this?

Regards,

Tvrtko
Chris Wilson Nov. 3, 2015, 10:55 a.m. UTC | #2
On Mon, Oct 26, 2015 at 01:10:19PM +0000, Tvrtko Ursulin wrote:
> 
> On 26/10/15 12:10, Chris Wilson wrote:
> > On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
> >>
> >> On 26/10/15 11:23, Chris Wilson wrote:
> >>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>> In the following commit:
> >>>>
> >>>>      commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
> >>>>      Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>      Date:   Mon Oct 5 13:26:36 2015 +0100
> >>>>
> >>>>          drm/i915: Clean up associated VMAs on context destruction
> >>>>
> >>>> I added a WARN_ON assertion that VM's active list must be empty
> >>>> at the time of owning context is getting freed, but that turned
> >>>> out to be a wrong assumption.
> >>>>
> >>>> Due ordering of operations in i915_gem_object_retire__read, where
> >>>> contexts are unreferenced before VMAs are moved to the inactive
> >>>> list, the described situation can in fact happen.
> >>>
> >>> The context is being unreferenced indirectly. Adding a direct reference
> >>> here is even more bizarre.
> >>
> >> Perhaps is not the prettiest, but it sounds logical to me to ensure
> >> that order of destruction of involved object hierarchy goes from the
> >> bottom-up and is not interleaved.
> >>
> >> If you consider the active/inactive list position as part of the
> >> retire process, doing it at the very place in code, and the very
> >> object that looked to be destroyed out of sequence, to me sounded
> >> logical.
> >>
> >> How would you do it, can you think of a better way?
> > 
> > The reference is via the request. We are handling requests, it makes
> > more sense that you take the reference on the request.
> 
> Hm, so you would be happy with:

Go up another level. There is just one callsite where the reference
needs to be added across the call.

And no, I would not be happy as I see this as just futher increasing the
technical debt.
-Chris
Tvrtko Ursulin Nov. 3, 2015, 11:08 a.m. UTC | #3
On 03/11/15 10:55, Chris Wilson wrote:
> On Mon, Oct 26, 2015 at 01:10:19PM +0000, Tvrtko Ursulin wrote:
>>
>> On 26/10/15 12:10, Chris Wilson wrote:
>>> On Mon, Oct 26, 2015 at 12:00:06PM +0000, Tvrtko Ursulin wrote:
>>>>
>>>> On 26/10/15 11:23, Chris Wilson wrote:
>>>>> On Mon, Oct 26, 2015 at 11:05:03AM +0000, Tvrtko Ursulin wrote:
>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>
>>>>>> In the following commit:
>>>>>>
>>>>>>       commit e9f24d5fb7cf3628b195b18ff3ac4e37937ceeae
>>>>>>       Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>       Date:   Mon Oct 5 13:26:36 2015 +0100
>>>>>>
>>>>>>           drm/i915: Clean up associated VMAs on context destruction
>>>>>>
>>>>>> I added a WARN_ON assertion that VM's active list must be empty
>>>>>> at the time of owning context is getting freed, but that turned
>>>>>> out to be a wrong assumption.
>>>>>>
>>>>>> Due ordering of operations in i915_gem_object_retire__read, where
>>>>>> contexts are unreferenced before VMAs are moved to the inactive
>>>>>> list, the described situation can in fact happen.
>>>>>
>>>>> The context is being unreferenced indirectly. Adding a direct reference
>>>>> here is even more bizarre.
>>>>
>>>> Perhaps is not the prettiest, but it sounds logical to me to ensure
>>>> that order of destruction of involved object hierarchy goes from the
>>>> bottom-up and is not interleaved.
>>>>
>>>> If you consider the active/inactive list position as part of the
>>>> retire process, doing it at the very place in code, and the very
>>>> object that looked to be destroyed out of sequence, to me sounded
>>>> logical.
>>>>
>>>> How would you do it, can you think of a better way?
>>>
>>> The reference is via the request. We are handling requests, it makes
>>> more sense that you take the reference on the request.
>>
>> Hm, so you would be happy with:
>
> Go up another level. There is just one callsite where the reference
> needs to be added across the call.

i915_gem_retire_requests_ring? Why do you think that is more logical?

To me it sounds really clean to do it in the place which deals with 
moving VMAs to the inactive list. It is localized and clear then - that 
it is fixing the illogic of allowing context destructor to run with VMAs 
still on the active list.

> And no, I would not be happy as I see this as just futher increasing the
> technical debt.

I thought we have agreed it is better to fix up what we have quickly, to 
the extent it is feasible, and work towards the rewrite over time.

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9b2048c7077d..c238481a8090 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2373,19 +2373,26 @@  static void
 i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
 {
        struct i915_vma *vma;
+       struct drm_i915_gem_request *req;
 
        RQ_BUG_ON(obj->last_read_req[ring] == NULL);
        RQ_BUG_ON(!(obj->active & (1 << ring)));
 
        list_del_init(&obj->ring_list[ring]);
+
+       /* Ensure context cannot be destroyed with VMAs on the active list. */
+       req = i915_gem_request_reference(obj->last_read_req[ring]);
+
        i915_gem_request_assign(&obj->last_read_req[ring], NULL);
 
        if (obj->last_write_req && obj->last_write_req->ring->id == ring)
                i915_gem_object_retire__write(obj);
 
        obj->active &= ~(1 << ring);
-       if (obj->active)
+       if (obj->active) {
+               i915_gem_request_unreference(req);
                return;
+       }
 
        /* Bump our place on the bound list to keep it roughly in LRU order
         * so that we don't steal from recently used but inactive objects
@@ -2399,6 +2406,8 @@  i915_gem_object_retire__read(struct drm_i915_gem_object *obj, int ring)
                        list_move_tail(&vma->mm_list, &vma->vm->inactive_list);
        }
 
+       i915_gem_request_unreference(req);
+
        i915_gem_request_assign(&obj->last_fenced_req, NULL);
        drm_gem_object_unreference(&obj->base);
 }