[v2] drm/i915: Track the last-active inside the i915_vma

Message ID	20180704083422.25572-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Wed, 4 Jul 2018 09:34:22 +0100 Message-Id: <20180704083422.25572-1-chris@chris-wilson.co.uk> In-Reply-To: <20180629225419.5832-6-chris@chris-wilson.co.uk> References: <20180629225419.5832-6-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH v2] drm/i915: Track the last-active inside the i915_vma Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Chris Wilson July 4, 2018, 8:34 a.m. UTC

Using a VMA on more than one timeline concurrently is the exception
rather than the rule (using it concurrently on multiple engines). As we
expect to only use one active tracker, store the most recently used
tracker inside the i915_vma itself and only fallback to the rbtree if
we need a second or more concurrent active trackers.

v2: Comments on how we overwrite any existing last_active cache.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_vma.h |  1 +
 2 files changed, 49 insertions(+), 2 deletions(-)

Tvrtko Ursulin July 4, 2018, 9:39 a.m. UTC | #1

On 04/07/2018 09:34, Chris Wilson wrote:
> Using a VMA on more than one timeline concurrently is the exception
> rather than the rule (using it concurrently on multiple engines). As we
> expect to only use one active tracker, store the most recently used
> tracker inside the i915_vma itself and only fallback to the rbtree if
> we need a second or more concurrent active trackers.
> 
> v2: Comments on how we overwrite any existing last_active cache.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/i915_vma.h |  1 +
>   2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index cd94ffc7f079..33925e00f7e8 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
>   	__i915_vma_retire(active->vma, rq);
>   }
>   
> +static void
> +i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
> +{
> +	__i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
> +}
> +
>   static struct i915_vma *
>   vma_create(struct drm_i915_gem_object *obj,
>   	   struct i915_address_space *vm,
> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   
>   	vma->active = RB_ROOT;
>   
> +	init_request_active(&vma->last_active, i915_vma_last_retire);
>   	init_request_active(&vma->last_fence, NULL);
>   	vma->vm = vm;
>   	vma->ops = &vm->vma_ops;
> @@ -895,6 +902,22 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>   {
>   	struct i915_vma_active *active;
>   	struct rb_node **p, *parent;
> +	struct i915_request *old;
> +
> +	/*
> +	 * We track the most recently used timeline to skip a rbtree search
> +	 * for the common case, under typical loads we never need the rbtree
> +	 * at all. We can reuse the last_active slot if it is empty, that is
> +	 * after the previous activity has been retired, or if the active
> +	 * matches the current timeline.
> +	 */
> +	old = i915_gem_active_raw(&vma->last_active,
> +				  &vma->vm->i915->drm.struct_mutex);
> +	if (!old || old->fence.context == idx)
> +		goto out;
> +
> +	/* Move the currently active fence into the rbtree */
> +	idx = old->fence.context;
>   
>   	parent = NULL;
>   	p = &vma->active.rb_node;
> @@ -903,7 +926,7 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>   
>   		active = rb_entry(parent, struct i915_vma_active, node);
>   		if (active->timeline == idx)
> -			return &active->base;
> +			goto replace;
>   
>   		if (active->timeline < idx)
>   			p = &parent->rb_right;
> @@ -922,7 +945,25 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>   	rb_link_node(&active->node, parent, p);
>   	rb_insert_color(&active->node, &vma->active);
>   
> -	return &active->base;
> +replace:
> +	/*
> +	 * Overwrite the previous active slot in the rbtree with last_active,
> +	 * leaving last_active zeroed. If the previous slot is still active,
> +	 * we must be careful as we now only expect to recieve one retire

typo in receive

> +	 * callback not two, and so much undo the active counting for the
> +	 * overwritten slot.
> +	 */
> +	if (i915_gem_active_isset(&active->base)) {
> +		__list_del_entry(&active->base.link);
> +		vma->active_count--;
 > +		GEM_BUG_ON(!vma->active_count);

I still don't get this. The cache is exclusive, so when transferring a 
record from rbtree to last_active, why do we need to decrement the 
vma->active_count here? Don't get the part in the comment about two 
retires - do you really sometimes expect two - ie cache is not exclusive?

But the fact that lookup of a cached entry is a straight return, meaning 
vma->active_count is manipulated elsewhere, makes me think it is 
avoidable messing with it on this path as well.

Maybe the separation of duties between the callers and this function 
needs to be stronger.

Regards,

Tvrtko

> +	}
> +	GEM_BUG_ON(list_empty(&vma->last_active.link));
> +	list_replace_init(&vma->last_active.link, &active->base.link);
> +	active->base.request = fetch_and_zero(&vma->last_active.request);
> +
> +out:
> +	return &vma->last_active;
>   }
>   
>   int i915_vma_move_to_active(struct i915_vma *vma,
> @@ -1002,6 +1043,11 @@ int i915_vma_unbind(struct i915_vma *vma)
>   		 */
>   		__i915_vma_pin(vma);
>   
> +		ret = i915_gem_active_retire(&vma->last_active,
> +					     &vma->vm->i915->drm.struct_mutex);
> +		if (ret)
> +			goto unpin;
> +
>   		rbtree_postorder_for_each_entry_safe(active, n,
>   						     &vma->active, node) {
>   			ret = i915_gem_active_retire(&active->base,
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index c297b0a0dc47..f06d66377107 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -97,6 +97,7 @@ struct i915_vma {
>   
>   	unsigned int active_count;
>   	struct rb_root active;
> +	struct i915_gem_active last_active;
>   	struct i915_gem_active last_fence;
>   
>   	/**
>

Tvrtko Ursulin July 4, 2018, 11:34 a.m. UTC | #2

On 04/07/2018 10:39, Tvrtko Ursulin wrote:
> 
> On 04/07/2018 09:34, Chris Wilson wrote:
>> Using a VMA on more than one timeline concurrently is the exception
>> rather than the rule (using it concurrently on multiple engines). As we
>> expect to only use one active tracker, store the most recently used
>> tracker inside the i915_vma itself and only fallback to the rbtree if
>> we need a second or more concurrent active trackers.
>>
>> v2: Comments on how we overwrite any existing last_active cache.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
>>   drivers/gpu/drm/i915/i915_vma.h |  1 +
>>   2 files changed, 49 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
>> b/drivers/gpu/drm/i915/i915_vma.c
>> index cd94ffc7f079..33925e00f7e8 100644
>> --- a/drivers/gpu/drm/i915/i915_vma.c
>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, 
>> struct i915_request *rq)
>>       __i915_vma_retire(active->vma, rq);
>>   }
>> +static void
>> +i915_vma_last_retire(struct i915_gem_active *base, struct 
>> i915_request *rq)
>> +{
>> +    __i915_vma_retire(container_of(base, struct i915_vma, 
>> last_active), rq);
>> +}
>> +
>>   static struct i915_vma *
>>   vma_create(struct drm_i915_gem_object *obj,
>>          struct i915_address_space *vm,
>> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>       vma->active = RB_ROOT;
>> +    init_request_active(&vma->last_active, i915_vma_last_retire);
>>       init_request_active(&vma->last_fence, NULL);
>>       vma->vm = vm;
>>       vma->ops = &vm->vma_ops;
>> @@ -895,6 +902,22 @@ static struct i915_gem_active 
>> *lookup_active(struct i915_vma *vma, u64 idx)
>>   {
>>       struct i915_vma_active *active;
>>       struct rb_node **p, *parent;
>> +    struct i915_request *old;
>> +
>> +    /*
>> +     * We track the most recently used timeline to skip a rbtree search
>> +     * for the common case, under typical loads we never need the rbtree
>> +     * at all. We can reuse the last_active slot if it is empty, that is
>> +     * after the previous activity has been retired, or if the active
>> +     * matches the current timeline.
>> +     */
>> +    old = i915_gem_active_raw(&vma->last_active,
>> +                  &vma->vm->i915->drm.struct_mutex);
>> +    if (!old || old->fence.context == idx)
>> +        goto out;
>> +
>> +    /* Move the currently active fence into the rbtree */
>> +    idx = old->fence.context;
>>       parent = NULL;
>>       p = &vma->active.rb_node;
>> @@ -903,7 +926,7 @@ static struct i915_gem_active 
>> *lookup_active(struct i915_vma *vma, u64 idx)
>>           active = rb_entry(parent, struct i915_vma_active, node);
>>           if (active->timeline == idx)
>> -            return &active->base;
>> +            goto replace;
>>           if (active->timeline < idx)
>>               p = &parent->rb_right;
>> @@ -922,7 +945,25 @@ static struct i915_gem_active 
>> *lookup_active(struct i915_vma *vma, u64 idx)
>>       rb_link_node(&active->node, parent, p);
>>       rb_insert_color(&active->node, &vma->active);
>> -    return &active->base;
>> +replace:
>> +    /*
>> +     * Overwrite the previous active slot in the rbtree with 
>> last_active,
>> +     * leaving last_active zeroed. If the previous slot is still active,
>> +     * we must be careful as we now only expect to recieve one retire
> 
> typo in receive
> 
>> +     * callback not two, and so much undo the active counting for the
>> +     * overwritten slot.
>> +     */
>> +    if (i915_gem_active_isset(&active->base)) {
>> +        __list_del_entry(&active->base.link);
>> +        vma->active_count--;
>  > +        GEM_BUG_ON(!vma->active_count);
> 
> I still don't get this. The cache is exclusive, so when transferring a 
> record from rbtree to last_active, why do we need to decrement the 
> vma->active_count here? Don't get the part in the comment about two 
> retires - do you really sometimes expect two - ie cache is not exclusive?
> 
> But the fact that lookup of a cached entry is a straight return, meaning 
> vma->active_count is manipulated elsewhere, makes me think it is 
> avoidable messing with it on this path as well.
> 
> Maybe the separation of duties between the callers and this function 
> needs to be stronger.

Hmm or is your cache actually inclusive? Don't see no rbtree 
manipulation on migration to and from last_active/rbtree..

And since rbtree lookup is always for the last_active context id, you 
would otherwise never hit the the "goto replace" path.

How do you ever look up an id which is not cached in last_active then?

I am thoroughly confused now..

Regards,

Tvrtko

Chris Wilson July 4, 2018, 11:47 a.m. UTC | #3

Quoting Tvrtko Ursulin (2018-07-04 12:34:04)
> 
> On 04/07/2018 10:39, Tvrtko Ursulin wrote:
> > 
> > On 04/07/2018 09:34, Chris Wilson wrote:
> >> Using a VMA on more than one timeline concurrently is the exception
> >> rather than the rule (using it concurrently on multiple engines). As we
> >> expect to only use one active tracker, store the most recently used
> >> tracker inside the i915_vma itself and only fallback to the rbtree if
> >> we need a second or more concurrent active trackers.
> >>
> >> v2: Comments on how we overwrite any existing last_active cache.
> >>
> >> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >> ---
> >>   drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
> >>   drivers/gpu/drm/i915/i915_vma.h |  1 +
> >>   2 files changed, 49 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_vma.c 
> >> b/drivers/gpu/drm/i915/i915_vma.c
> >> index cd94ffc7f079..33925e00f7e8 100644
> >> --- a/drivers/gpu/drm/i915/i915_vma.c
> >> +++ b/drivers/gpu/drm/i915/i915_vma.c
> >> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, 
> >> struct i915_request *rq)
> >>       __i915_vma_retire(active->vma, rq);
> >>   }
> >> +static void
> >> +i915_vma_last_retire(struct i915_gem_active *base, struct 
> >> i915_request *rq)
> >> +{
> >> +    __i915_vma_retire(container_of(base, struct i915_vma, 
> >> last_active), rq);
> >> +}
> >> +
> >>   static struct i915_vma *
> >>   vma_create(struct drm_i915_gem_object *obj,
> >>          struct i915_address_space *vm,
> >> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
> >>       vma->active = RB_ROOT;
> >> +    init_request_active(&vma->last_active, i915_vma_last_retire);
> >>       init_request_active(&vma->last_fence, NULL);
> >>       vma->vm = vm;
> >>       vma->ops = &vm->vma_ops;
> >> @@ -895,6 +902,22 @@ static struct i915_gem_active 
> >> *lookup_active(struct i915_vma *vma, u64 idx)
> >>   {
> >>       struct i915_vma_active *active;
> >>       struct rb_node **p, *parent;
> >> +    struct i915_request *old;
> >> +
> >> +    /*
> >> +     * We track the most recently used timeline to skip a rbtree search
> >> +     * for the common case, under typical loads we never need the rbtree
> >> +     * at all. We can reuse the last_active slot if it is empty, that is
> >> +     * after the previous activity has been retired, or if the active
> >> +     * matches the current timeline.
> >> +     */
> >> +    old = i915_gem_active_raw(&vma->last_active,
> >> +                  &vma->vm->i915->drm.struct_mutex);
> >> +    if (!old || old->fence.context == idx)
> >> +        goto out;
> >> +
> >> +    /* Move the currently active fence into the rbtree */
> >> +    idx = old->fence.context;
> >>       parent = NULL;
> >>       p = &vma->active.rb_node;
> >> @@ -903,7 +926,7 @@ static struct i915_gem_active 
> >> *lookup_active(struct i915_vma *vma, u64 idx)
> >>           active = rb_entry(parent, struct i915_vma_active, node);
> >>           if (active->timeline == idx)
> >> -            return &active->base;
> >> +            goto replace;
> >>           if (active->timeline < idx)
> >>               p = &parent->rb_right;
> >> @@ -922,7 +945,25 @@ static struct i915_gem_active 
> >> *lookup_active(struct i915_vma *vma, u64 idx)
> >>       rb_link_node(&active->node, parent, p);
> >>       rb_insert_color(&active->node, &vma->active);
> >> -    return &active->base;
> >> +replace:
> >> +    /*
> >> +     * Overwrite the previous active slot in the rbtree with 
> >> last_active,
> >> +     * leaving last_active zeroed. If the previous slot is still active,
> >> +     * we must be careful as we now only expect to recieve one retire
> > 
> > typo in receive
> > 
> >> +     * callback not two, and so much undo the active counting for the
> >> +     * overwritten slot.
> >> +     */
> >> +    if (i915_gem_active_isset(&active->base)) {
> >> +        __list_del_entry(&active->base.link);
> >> +        vma->active_count--;
> >  > +        GEM_BUG_ON(!vma->active_count);
> > 
> > I still don't get this. The cache is exclusive, so when transferring a 
> > record from rbtree to last_active, why do we need to decrement the 
> > vma->active_count here? Don't get the part in the comment about two 
> > retires - do you really sometimes expect two - ie cache is not exclusive?
> > 
> > But the fact that lookup of a cached entry is a straight return, meaning 
> > vma->active_count is manipulated elsewhere, makes me think it is 
> > avoidable messing with it on this path as well.
> > 
> > Maybe the separation of duties between the callers and this function 
> > needs to be stronger.
> 
> Hmm or is your cache actually inclusive? Don't see no rbtree 
> manipulation on migration to and from last_active/rbtree..

Both. Inclusive in the sense that both last_active and its timeline slot
in the rbtree may be active tracking different requests and so receive
retirement callbacks independently. Exclusive in that we don't store
last_active in the cache slot and in the rbtree.
 
> And since rbtree lookup is always for the last_active context id, you 
> would otherwise never hit the the "goto replace" path.
> 
> How do you ever look up an id which is not cached in last_active then?

We don't. We only lookup on evicting a still active request from
last_active. The MRU recent request always goes into last_active.
-Chris

Tvrtko Ursulin July 4, 2018, 12:30 p.m. UTC | #4

On 04/07/2018 12:47, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-07-04 12:34:04)
>>
>> On 04/07/2018 10:39, Tvrtko Ursulin wrote:
>>>
>>> On 04/07/2018 09:34, Chris Wilson wrote:
>>>> Using a VMA on more than one timeline concurrently is the exception
>>>> rather than the rule (using it concurrently on multiple engines). As we
>>>> expect to only use one active tracker, store the most recently used
>>>> tracker inside the i915_vma itself and only fallback to the rbtree if
>>>> we need a second or more concurrent active trackers.
>>>>
>>>> v2: Comments on how we overwrite any existing last_active cache.
>>>>
>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
>>>>    drivers/gpu/drm/i915/i915_vma.h |  1 +
>>>>    2 files changed, 49 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c
>>>> b/drivers/gpu/drm/i915/i915_vma.c
>>>> index cd94ffc7f079..33925e00f7e8 100644
>>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>>> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base,
>>>> struct i915_request *rq)
>>>>        __i915_vma_retire(active->vma, rq);
>>>>    }
>>>> +static void
>>>> +i915_vma_last_retire(struct i915_gem_active *base, struct
>>>> i915_request *rq)
>>>> +{
>>>> +    __i915_vma_retire(container_of(base, struct i915_vma,
>>>> last_active), rq);
>>>> +}
>>>> +
>>>>    static struct i915_vma *
>>>>    vma_create(struct drm_i915_gem_object *obj,
>>>>           struct i915_address_space *vm,
>>>> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>>        vma->active = RB_ROOT;
>>>> +    init_request_active(&vma->last_active, i915_vma_last_retire);
>>>>        init_request_active(&vma->last_fence, NULL);
>>>>        vma->vm = vm;
>>>>        vma->ops = &vm->vma_ops;
>>>> @@ -895,6 +902,22 @@ static struct i915_gem_active
>>>> *lookup_active(struct i915_vma *vma, u64 idx)
>>>>    {
>>>>        struct i915_vma_active *active;
>>>>        struct rb_node **p, *parent;
>>>> +    struct i915_request *old;
>>>> +
>>>> +    /*
>>>> +     * We track the most recently used timeline to skip a rbtree search
>>>> +     * for the common case, under typical loads we never need the rbtree
>>>> +     * at all. We can reuse the last_active slot if it is empty, that is
>>>> +     * after the previous activity has been retired, or if the active
>>>> +     * matches the current timeline.
>>>> +     */
>>>> +    old = i915_gem_active_raw(&vma->last_active,
>>>> +                  &vma->vm->i915->drm.struct_mutex);
>>>> +    if (!old || old->fence.context == idx)
>>>> +        goto out;
>>>> +
>>>> +    /* Move the currently active fence into the rbtree */
>>>> +    idx = old->fence.context;
>>>>        parent = NULL;
>>>>        p = &vma->active.rb_node;
>>>> @@ -903,7 +926,7 @@ static struct i915_gem_active
>>>> *lookup_active(struct i915_vma *vma, u64 idx)
>>>>            active = rb_entry(parent, struct i915_vma_active, node);
>>>>            if (active->timeline == idx)
>>>> -            return &active->base;
>>>> +            goto replace;
>>>>            if (active->timeline < idx)
>>>>                p = &parent->rb_right;
>>>> @@ -922,7 +945,25 @@ static struct i915_gem_active
>>>> *lookup_active(struct i915_vma *vma, u64 idx)
>>>>        rb_link_node(&active->node, parent, p);
>>>>        rb_insert_color(&active->node, &vma->active);
>>>> -    return &active->base;
>>>> +replace:
>>>> +    /*
>>>> +     * Overwrite the previous active slot in the rbtree with
>>>> last_active,
>>>> +     * leaving last_active zeroed. If the previous slot is still active,
>>>> +     * we must be careful as we now only expect to recieve one retire
>>>
>>> typo in receive
>>>
>>>> +     * callback not two, and so much undo the active counting for the
>>>> +     * overwritten slot.
>>>> +     */
>>>> +    if (i915_gem_active_isset(&active->base)) {
>>>> +        __list_del_entry(&active->base.link);
>>>> +        vma->active_count--;
>>>   > +        GEM_BUG_ON(!vma->active_count);
>>>
>>> I still don't get this. The cache is exclusive, so when transferring a
>>> record from rbtree to last_active, why do we need to decrement the
>>> vma->active_count here? Don't get the part in the comment about two
>>> retires - do you really sometimes expect two - ie cache is not exclusive?
>>>
>>> But the fact that lookup of a cached entry is a straight return, meaning
>>> vma->active_count is manipulated elsewhere, makes me think it is
>>> avoidable messing with it on this path as well.
>>>
>>> Maybe the separation of duties between the callers and this function
>>> needs to be stronger.
>>
>> Hmm or is your cache actually inclusive? Don't see no rbtree
>> manipulation on migration to and from last_active/rbtree..
> 
> Both. Inclusive in the sense that both last_active and its timeline slot
> in the rbtree may be active tracking different requests and so receive
> retirement callbacks independently. Exclusive in that we don't store
> last_active in the cache slot and in the rbtree.

I don't know how to reconcile between two sentences (statements). They 
seem in contradiction. :(
    >> And since rbtree lookup is always for the last_active context id, you
>> would otherwise never hit the the "goto replace" path.
>>
>> How do you ever look up an id which is not cached in last_active then?
> 
> We don't. We only lookup on evicting a still active request from
> last_active. The MRU recent request always goes into last_active.

Sorry lost, another day maybe.

Regards,

Tvrtko

Tvrtko Ursulin July 5, 2018, 11:38 a.m. UTC | #5

On 04/07/2018 09:34, Chris Wilson wrote:
> Using a VMA on more than one timeline concurrently is the exception
> rather than the rule (using it concurrently on multiple engines). As we
> expect to only use one active tracker, store the most recently used
> tracker inside the i915_vma itself and only fallback to the rbtree if
> we need a second or more concurrent active trackers.
> 
> v2: Comments on how we overwrite any existing last_active cache.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/i915_vma.h |  1 +
>   2 files changed, 49 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index cd94ffc7f079..33925e00f7e8 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
>   	__i915_vma_retire(active->vma, rq);
>   }
>   
> +static void
> +i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
> +{
> +	__i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
> +}
> +
>   static struct i915_vma *
>   vma_create(struct drm_i915_gem_object *obj,
>   	   struct i915_address_space *vm,
> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
>   
>   	vma->active = RB_ROOT;
>   
> +	init_request_active(&vma->last_active, i915_vma_last_retire);
>   	init_request_active(&vma->last_fence, NULL);
>   	vma->vm = vm;
>   	vma->ops = &vm->vma_ops;
> @@ -895,6 +902,22 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>   {
>   	struct i915_vma_active *active;
>   	struct rb_node **p, *parent;
> +	struct i915_request *old;
> +
> +	/*
> +	 * We track the most recently used timeline to skip a rbtree search
> +	 * for the common case, under typical loads we never need the rbtree
> +	 * at all. We can reuse the last_active slot if it is empty, that is
> +	 * after the previous activity has been retired, or if the active
> +	 * matches the current timeline.
> +	 */
> +	old = i915_gem_active_raw(&vma->last_active,
> +				  &vma->vm->i915->drm.struct_mutex);
> +	if (!old || old->fence.context == idx)
> +		goto out;

Is the situation that retire can be out of order relative to 
move_to_active? In other words, last_active can retire before the rbtree 
record, and so the following new move_to_active will find last_active 
empty and so could create a double entry for the same timeline?

Avoiding that would defeat the caching, unless when last_active is 
available we also check the tree, *if* the vma->active_count > 0?

That way we avoid creating duplicate entries.

But would still need to pull out this tree entry into last_active after 
the fact.

Regards,

Tvrtko

> +
> +	/* Move the currently active fence into the rbtree */
> +	idx = old->fence.context;
>   
>   	parent = NULL;
>   	p = &vma->active.rb_node;
> @@ -903,7 +926,7 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>   
>   		active = rb_entry(parent, struct i915_vma_active, node);
>   		if (active->timeline == idx)
> -			return &active->base;
> +			goto replace;
>   
>   		if (active->timeline < idx)
>   			p = &parent->rb_right;
> @@ -922,7 +945,25 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>   	rb_link_node(&active->node, parent, p);
>   	rb_insert_color(&active->node, &vma->active);
>   
> -	return &active->base;
> +replace:
> +	/*
> +	 * Overwrite the previous active slot in the rbtree with last_active,
> +	 * leaving last_active zeroed. If the previous slot is still active,
> +	 * we must be careful as we now only expect to recieve one retire
> +	 * callback not two, and so much undo the active counting for the
> +	 * overwritten slot.
> +	 */
> +	if (i915_gem_active_isset(&active->base)) {
> +		__list_del_entry(&active->base.link);
> +		vma->active_count--;
> +		GEM_BUG_ON(!vma->active_count);
> +	}
> +	GEM_BUG_ON(list_empty(&vma->last_active.link));
> +	list_replace_init(&vma->last_active.link, &active->base.link);
> +	active->base.request = fetch_and_zero(&vma->last_active.request);
> +
> +out:
> +	return &vma->last_active;
>   }
>   
>   int i915_vma_move_to_active(struct i915_vma *vma,
> @@ -1002,6 +1043,11 @@ int i915_vma_unbind(struct i915_vma *vma)
>   		 */
>   		__i915_vma_pin(vma);
>   
> +		ret = i915_gem_active_retire(&vma->last_active,
> +					     &vma->vm->i915->drm.struct_mutex);
> +		if (ret)
> +			goto unpin;
> +
>   		rbtree_postorder_for_each_entry_safe(active, n,
>   						     &vma->active, node) {
>   			ret = i915_gem_active_retire(&active->base,
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index c297b0a0dc47..f06d66377107 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -97,6 +97,7 @@ struct i915_vma {
>   
>   	unsigned int active_count;
>   	struct rb_root active;
> +	struct i915_gem_active last_active;
>   	struct i915_gem_active last_fence;
>   
>   	/**
>

Chris Wilson July 5, 2018, 12:02 p.m. UTC | #6

Quoting Tvrtko Ursulin (2018-07-05 12:38:46)
> 
> On 04/07/2018 09:34, Chris Wilson wrote:
> > Using a VMA on more than one timeline concurrently is the exception
> > rather than the rule (using it concurrently on multiple engines). As we
> > expect to only use one active tracker, store the most recently used
> > tracker inside the i915_vma itself and only fallback to the rbtree if
> > we need a second or more concurrent active trackers.
> > 
> > v2: Comments on how we overwrite any existing last_active cache.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
> >   drivers/gpu/drm/i915/i915_vma.h |  1 +
> >   2 files changed, 49 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> > index cd94ffc7f079..33925e00f7e8 100644
> > --- a/drivers/gpu/drm/i915/i915_vma.c
> > +++ b/drivers/gpu/drm/i915/i915_vma.c
> > @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
> >       __i915_vma_retire(active->vma, rq);
> >   }
> >   
> > +static void
> > +i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
> > +{
> > +     __i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
> > +}
> > +
> >   static struct i915_vma *
> >   vma_create(struct drm_i915_gem_object *obj,
> >          struct i915_address_space *vm,
> > @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
> >   
> >       vma->active = RB_ROOT;
> >   
> > +     init_request_active(&vma->last_active, i915_vma_last_retire);
> >       init_request_active(&vma->last_fence, NULL);
> >       vma->vm = vm;
> >       vma->ops = &vm->vma_ops;
> > @@ -895,6 +902,22 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
> >   {
> >       struct i915_vma_active *active;
> >       struct rb_node **p, *parent;
> > +     struct i915_request *old;
> > +
> > +     /*
> > +      * We track the most recently used timeline to skip a rbtree search
> > +      * for the common case, under typical loads we never need the rbtree
> > +      * at all. We can reuse the last_active slot if it is empty, that is
> > +      * after the previous activity has been retired, or if the active
> > +      * matches the current timeline.
> > +      */
> > +     old = i915_gem_active_raw(&vma->last_active,
> > +                               &vma->vm->i915->drm.struct_mutex);
> > +     if (!old || old->fence.context == idx)
> > +             goto out;
> 
> Is the situation that retire can be out of order relative to 
> move_to_active? In other words, last_active can retire before the rbtree 
> record, and so the following new move_to_active will find last_active 
> empty and so could create a double entry for the same timeline?

We don't mind a double entry, and do expect that last_active and the
rbtree entry will still be active, tracking different requests.
-Chris

Tvrtko Ursulin July 5, 2018, 12:29 p.m. UTC | #7

On 05/07/2018 13:02, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-07-05 12:38:46)
>>
>> On 04/07/2018 09:34, Chris Wilson wrote:
>>> Using a VMA on more than one timeline concurrently is the exception
>>> rather than the rule (using it concurrently on multiple engines). As we
>>> expect to only use one active tracker, store the most recently used
>>> tracker inside the i915_vma itself and only fallback to the rbtree if
>>> we need a second or more concurrent active trackers.
>>>
>>> v2: Comments on how we overwrite any existing last_active cache.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
>>>    drivers/gpu/drm/i915/i915_vma.h |  1 +
>>>    2 files changed, 49 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
>>> index cd94ffc7f079..33925e00f7e8 100644
>>> --- a/drivers/gpu/drm/i915/i915_vma.c
>>> +++ b/drivers/gpu/drm/i915/i915_vma.c
>>> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
>>>        __i915_vma_retire(active->vma, rq);
>>>    }
>>>    
>>> +static void
>>> +i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
>>> +{
>>> +     __i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
>>> +}
>>> +
>>>    static struct i915_vma *
>>>    vma_create(struct drm_i915_gem_object *obj,
>>>           struct i915_address_space *vm,
>>> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
>>>    
>>>        vma->active = RB_ROOT;
>>>    
>>> +     init_request_active(&vma->last_active, i915_vma_last_retire);
>>>        init_request_active(&vma->last_fence, NULL);
>>>        vma->vm = vm;
>>>        vma->ops = &vm->vma_ops;
>>> @@ -895,6 +902,22 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
>>>    {
>>>        struct i915_vma_active *active;
>>>        struct rb_node **p, *parent;
>>> +     struct i915_request *old;
>>> +
>>> +     /*
>>> +      * We track the most recently used timeline to skip a rbtree search
>>> +      * for the common case, under typical loads we never need the rbtree
>>> +      * at all. We can reuse the last_active slot if it is empty, that is
>>> +      * after the previous activity has been retired, or if the active
>>> +      * matches the current timeline.
>>> +      */
>>> +     old = i915_gem_active_raw(&vma->last_active,
>>> +                               &vma->vm->i915->drm.struct_mutex);
>>> +     if (!old || old->fence.context == idx)
>>> +             goto out;
>>
>> Is the situation that retire can be out of order relative to
>> move_to_active? In other words, last_active can retire before the rbtree
>> record, and so the following new move_to_active will find last_active
>> empty and so could create a double entry for the same timeline?
> 
> We don't mind a double entry, and do expect that last_active and the
> rbtree entry will still be active, tracking different requests.

Maybe I mind double entries, if avoiding them would make the code easier 
to understand. :) Or not that we don't mind, but need them? Different 
requests you say, but on same timeline or?

Regards,

Tvrtko

Chris Wilson July 5, 2018, 12:48 p.m. UTC | #8

Quoting Tvrtko Ursulin (2018-07-05 13:29:44)
> 
> On 05/07/2018 13:02, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-07-05 12:38:46)
> >>
> >> On 04/07/2018 09:34, Chris Wilson wrote:
> >>> Using a VMA on more than one timeline concurrently is the exception
> >>> rather than the rule (using it concurrently on multiple engines). As we
> >>> expect to only use one active tracker, store the most recently used
> >>> tracker inside the i915_vma itself and only fallback to the rbtree if
> >>> we need a second or more concurrent active trackers.
> >>>
> >>> v2: Comments on how we overwrite any existing last_active cache.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_vma.c | 50 +++++++++++++++++++++++++++++++--
> >>>    drivers/gpu/drm/i915/i915_vma.h |  1 +
> >>>    2 files changed, 49 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> >>> index cd94ffc7f079..33925e00f7e8 100644
> >>> --- a/drivers/gpu/drm/i915/i915_vma.c
> >>> +++ b/drivers/gpu/drm/i915/i915_vma.c
> >>> @@ -119,6 +119,12 @@ i915_vma_retire(struct i915_gem_active *base, struct i915_request *rq)
> >>>        __i915_vma_retire(active->vma, rq);
> >>>    }
> >>>    
> >>> +static void
> >>> +i915_vma_last_retire(struct i915_gem_active *base, struct i915_request *rq)
> >>> +{
> >>> +     __i915_vma_retire(container_of(base, struct i915_vma, last_active), rq);
> >>> +}
> >>> +
> >>>    static struct i915_vma *
> >>>    vma_create(struct drm_i915_gem_object *obj,
> >>>           struct i915_address_space *vm,
> >>> @@ -136,6 +142,7 @@ vma_create(struct drm_i915_gem_object *obj,
> >>>    
> >>>        vma->active = RB_ROOT;
> >>>    
> >>> +     init_request_active(&vma->last_active, i915_vma_last_retire);
> >>>        init_request_active(&vma->last_fence, NULL);
> >>>        vma->vm = vm;
> >>>        vma->ops = &vm->vma_ops;
> >>> @@ -895,6 +902,22 @@ static struct i915_gem_active *lookup_active(struct i915_vma *vma, u64 idx)
> >>>    {
> >>>        struct i915_vma_active *active;
> >>>        struct rb_node **p, *parent;
> >>> +     struct i915_request *old;
> >>> +
> >>> +     /*
> >>> +      * We track the most recently used timeline to skip a rbtree search
> >>> +      * for the common case, under typical loads we never need the rbtree
> >>> +      * at all. We can reuse the last_active slot if it is empty, that is
> >>> +      * after the previous activity has been retired, or if the active
> >>> +      * matches the current timeline.
> >>> +      */
> >>> +     old = i915_gem_active_raw(&vma->last_active,
> >>> +                               &vma->vm->i915->drm.struct_mutex);
> >>> +     if (!old || old->fence.context == idx)
> >>> +             goto out;
> >>
> >> Is the situation that retire can be out of order relative to
> >> move_to_active? In other words, last_active can retire before the rbtree
> >> record, and so the following new move_to_active will find last_active
> >> empty and so could create a double entry for the same timeline?
> > 
> > We don't mind a double entry, and do expect that last_active and the
> > rbtree entry will still be active, tracking different requests.
> 
> Maybe I mind double entries, if avoiding them would make the code easier 
> to understand. :) Or not that we don't mind, but need them? Different 
> requests you say, but on same timeline or?

Same timeline, we are indexing by timeline after all. Having the double
entry avoids having to search the rbtree for every new request, and even
the radixtree lookup was a noticeable wart in the profiles.

Ok, not every new request, only on changing, do you then need to lookup
the new request to promote it to cached, but still need to insert the old
cached request. That's more code just to avoid the isset() replacement.
-Chris

[v2] drm/i915: Track the last-active inside the i915_vma

Commit Message

Comments

Patch