diff mbox

[v2] drm/i915: Shrink the GEM kmem_caches upon idling

Message ID 20180116130519.16255-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Jan. 16, 2018, 1:05 p.m. UTC
When we finally decide the gpu is idle, that is a good time to shrink
our kmem_caches.

v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
worker.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)

Comments

Tvrtko Ursulin Jan. 16, 2018, 3:12 p.m. UTC | #1
On 16/01/2018 13:05, Chris Wilson wrote:
> When we finally decide the gpu is idle, that is a good time to shrink
> our kmem_caches.
> 
> v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
> worker.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
>   1 file changed, 30 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 335731c93b4a..61b13fdfaa71 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
>   	}
>   }
>   
> +static void shrink_caches(struct drm_i915_private *i915)
> +{
> +	/*
> +	 * kmem_cache_shrink() discards empty slabs and reorders partially
> +	 * filled slabs to prioritise allocating from the mostly full slabs,
> +	 * with the aim of reducing fragmentation.
> +	 */
> +	kmem_cache_shrink(i915->priorities);
> +	kmem_cache_shrink(i915->dependencies);
> +	kmem_cache_shrink(i915->requests);
> +	kmem_cache_shrink(i915->luts);
> +	kmem_cache_shrink(i915->vmas);
> +	kmem_cache_shrink(i915->objects);
> +}
> +
>   static inline bool
>   new_requests_since_last_retire(const struct drm_i915_private *i915)
>   {
> @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
>   		GEM_BUG_ON(!dev_priv->gt.awake);
>   		i915_queue_hangcheck(dev_priv);
>   	}
> +
> +	/*
> +	 * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
> +	 * returned to the system imediately but only after an RCU grace
> +	 * period. We want to encourage such pages to be returned and so
> +	 * incorporate a RCU barrier here to provide some rate limiting
> +	 * of the driver and flush the old pages before we free a new batch
> +	 * from the next round of shrinking.
> +	 */
> +	rcu_barrier();

Should this go into the conditional below? I don't think it makes a 
difference effectively, but may be more logical.

> +
> +	if (!new_requests_since_last_retire(dev_priv)) {
> +		__i915_gem_free_work(&dev_priv->mm.free_work);

I thought for a bit if re-using the worker from here is completely fine 
but I think it is. We expect only one pass when called from here so 
need_resched will be correctly neutralized/not-relevant from this path. 
Hm, unless if we consider mmap_gtt users.. so we could still have new 
objects appearing on the free_list after the 1st pass. And then 
need_resched might kick us out. What do you think?

Regards,

Tvrtko

> +		shrink_caches(dev_priv);
> +	}
>   }
>   
>   int i915_gem_suspend(struct drm_i915_private *dev_priv)
>
Tvrtko Ursulin Jan. 16, 2018, 3:16 p.m. UTC | #2
On 16/01/2018 15:12, Tvrtko Ursulin wrote:
> 
> On 16/01/2018 13:05, Chris Wilson wrote:
>> When we finally decide the gpu is idle, that is a good time to shrink
>> our kmem_caches.
>>
>> v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
>> worker.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
>>   1 file changed, 30 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 335731c93b4a..61b13fdfaa71 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct 
>> *work)
>>       }
>>   }
>> +static void shrink_caches(struct drm_i915_private *i915)
>> +{
>> +    /*
>> +     * kmem_cache_shrink() discards empty slabs and reorders partially
>> +     * filled slabs to prioritise allocating from the mostly full slabs,
>> +     * with the aim of reducing fragmentation.
>> +     */
>> +    kmem_cache_shrink(i915->priorities);
>> +    kmem_cache_shrink(i915->dependencies);
>> +    kmem_cache_shrink(i915->requests);
>> +    kmem_cache_shrink(i915->luts);
>> +    kmem_cache_shrink(i915->vmas);
>> +    kmem_cache_shrink(i915->objects);
>> +}
>> +
>>   static inline bool
>>   new_requests_since_last_retire(const struct drm_i915_private *i915)
>>   {
>> @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct 
>> *work)
>>           GEM_BUG_ON(!dev_priv->gt.awake);
>>           i915_queue_hangcheck(dev_priv);
>>       }
>> +
>> +    /*
>> +     * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
>> +     * returned to the system imediately but only after an RCU grace
>> +     * period. We want to encourage such pages to be returned and so
>> +     * incorporate a RCU barrier here to provide some rate limiting
>> +     * of the driver and flush the old pages before we free a new batch
>> +     * from the next round of shrinking.
>> +     */
>> +    rcu_barrier();
> 
> Should this go into the conditional below? I don't think it makes a 
> difference effectively, but may be more logical.
> 
>> +
>> +    if (!new_requests_since_last_retire(dev_priv)) {
>> +        __i915_gem_free_work(&dev_priv->mm.free_work);
> 
> I thought for a bit if re-using the worker from here is completely fine 
> but I think it is. We expect only one pass when called from here so 
> need_resched will be correctly neutralized/not-relevant from this path. 
> Hm, unless if we consider mmap_gtt users.. so we could still have new 
> objects appearing on the free_list after the 1st pass. And then 
> need_resched might kick us out. What do you think?

This also ties back to what I wrote in the earlier reply - do we want to 
shrink the obj and vma caches from here? It may be colliding with 
mmap_gtt operations. But it sounds appealing to tidy them, and I can't 
think of any other convenient point. Given how we are de-prioritising 
mmap_gtt its probably fine.

> 
> Regards,
> 
> Tvrtko
> 
>> +        shrink_caches(dev_priv);
>> +    }
>>   }
>>   int i915_gem_suspend(struct drm_i915_private *dev_priv)
>>
Chris Wilson Jan. 16, 2018, 3:21 p.m. UTC | #3
Quoting Tvrtko Ursulin (2018-01-16 15:12:43)
> 
> On 16/01/2018 13:05, Chris Wilson wrote:
> > When we finally decide the gpu is idle, that is a good time to shrink
> > our kmem_caches.
> > 
> > v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
> > worker.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
> >   1 file changed, 30 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 335731c93b4a..61b13fdfaa71 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
> >       }
> >   }
> >   
> > +static void shrink_caches(struct drm_i915_private *i915)
> > +{
> > +     /*
> > +      * kmem_cache_shrink() discards empty slabs and reorders partially
> > +      * filled slabs to prioritise allocating from the mostly full slabs,
> > +      * with the aim of reducing fragmentation.
> > +      */
> > +     kmem_cache_shrink(i915->priorities);
> > +     kmem_cache_shrink(i915->dependencies);
> > +     kmem_cache_shrink(i915->requests);
> > +     kmem_cache_shrink(i915->luts);
> > +     kmem_cache_shrink(i915->vmas);
> > +     kmem_cache_shrink(i915->objects);
> > +}
> > +
> >   static inline bool
> >   new_requests_since_last_retire(const struct drm_i915_private *i915)
> >   {
> > @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
> >               GEM_BUG_ON(!dev_priv->gt.awake);
> >               i915_queue_hangcheck(dev_priv);
> >       }
> > +
> > +     /*
> > +      * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
> > +      * returned to the system imediately but only after an RCU grace
> > +      * period. We want to encourage such pages to be returned and so
> > +      * incorporate a RCU barrier here to provide some rate limiting
> > +      * of the driver and flush the old pages before we free a new batch
> > +      * from the next round of shrinking.
> > +      */
> > +     rcu_barrier();
> 
> Should this go into the conditional below? I don't think it makes a 
> difference effectively, but may be more logical.

My thinking was to have the check after the sleep as the state is
subject to change. I'm not concerned about the random unnecessary pauses
on this wq, since it is subject to struct_mutex delays, so was quite
happy to think of this as being "we shall only do one idle pass per RCU
grace period".

> > +
> > +     if (!new_requests_since_last_retire(dev_priv)) {
> > +             __i915_gem_free_work(&dev_priv->mm.free_work);
> 
> I thought for a bit if re-using the worker from here is completely fine 
> but I think it is. We expect only one pass when called from here so 
> need_resched will be correctly neutralized/not-relevant from this path. 

At present, I was only thinking about the single path. This was meant to
resemble i915_gem_drain_objects(), without the recursion :)

> Hm, unless if we consider mmap_gtt users.. so we could still have new 
> objects appearing on the free_list after the 1st pass. And then 
> need_resched might kick us out. What do you think?

Not just mmap_gtt, any user freeing objects (coupled with RCU grace
periods). I don't think it matters if we happen to loop until the
timeslice is consumed as we are doing work that we would be doing
anyway on this i915->wq.
-Chris
Tvrtko Ursulin Jan. 16, 2018, 5:25 p.m. UTC | #4
On 16/01/2018 15:21, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-16 15:12:43)
>>
>> On 16/01/2018 13:05, Chris Wilson wrote:
>>> When we finally decide the gpu is idle, that is a good time to shrink
>>> our kmem_caches.
>>>
>>> v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
>>> worker.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
>>>    1 file changed, 30 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 335731c93b4a..61b13fdfaa71 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
>>>        }
>>>    }
>>>    
>>> +static void shrink_caches(struct drm_i915_private *i915)
>>> +{
>>> +     /*
>>> +      * kmem_cache_shrink() discards empty slabs and reorders partially
>>> +      * filled slabs to prioritise allocating from the mostly full slabs,
>>> +      * with the aim of reducing fragmentation.
>>> +      */
>>> +     kmem_cache_shrink(i915->priorities);
>>> +     kmem_cache_shrink(i915->dependencies);
>>> +     kmem_cache_shrink(i915->requests);
>>> +     kmem_cache_shrink(i915->luts);
>>> +     kmem_cache_shrink(i915->vmas);
>>> +     kmem_cache_shrink(i915->objects);
>>> +}
>>> +
>>>    static inline bool
>>>    new_requests_since_last_retire(const struct drm_i915_private *i915)
>>>    {
>>> @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
>>>                GEM_BUG_ON(!dev_priv->gt.awake);
>>>                i915_queue_hangcheck(dev_priv);
>>>        }
>>> +
>>> +     /*
>>> +      * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
>>> +      * returned to the system imediately but only after an RCU grace
>>> +      * period. We want to encourage such pages to be returned and so
>>> +      * incorporate a RCU barrier here to provide some rate limiting
>>> +      * of the driver and flush the old pages before we free a new batch
>>> +      * from the next round of shrinking.
>>> +      */
>>> +     rcu_barrier();
>>
>> Should this go into the conditional below? I don't think it makes a
>> difference effectively, but may be more logical.
> 
> My thinking was to have the check after the sleep as the state is
> subject to change. I'm not concerned about the random unnecessary pauses
> on this wq, since it is subject to struct_mutex delays, so was quite

The delay doesn't worry me, but just that it is random - neither the 
appearance of new requests, or completion of existing ones, has nothing 
to do with one RCU grace period.

> happy to think of this as being "we shall only do one idle pass per RCU
> grace period".

Idle worker is probably several orders of magnitude less frequent than 
RCU grace periods so I don't think that can be a concern.

Hm..

>>> +
>>> +     if (!new_requests_since_last_retire(dev_priv)) {
>>> +             __i915_gem_free_work(&dev_priv->mm.free_work);

... you wouldn't want to pull this up under the struct mutex section? It 
would need a different flavour of a function to be called, and some 
refactoring of the existing ones.

shrink_caches could be left here under the same check and preceded by 
rcu_barrier.

>> I thought for a bit if re-using the worker from here is completely fine
>> but I think it is. We expect only one pass when called from here so
>> need_resched will be correctly neutralized/not-relevant from this path.
> 
> At present, I was only thinking about the single path. This was meant to
> resemble i915_gem_drain_objects(), without the recursion :)
> 
>> Hm, unless if we consider mmap_gtt users.. so we could still have new
>> objects appearing on the free_list after the 1st pass. And then
>> need_resched might kick us out. What do you think?
> 
> Not just mmap_gtt, any user freeing objects (coupled with RCU grace
> periods). I don't think it matters if we happen to loop until the
> timeslice is consumed as we are doing work that we would be doing
> anyway on this i915->wq.

Yeah doesn't matter - I was thinking if we should explicitly not 
consider need_resched when called from the idle worker and only grab the 
first batch - what's currently on the freed list.

Regards,

Tvrtko
Chris Wilson Jan. 16, 2018, 5:36 p.m. UTC | #5
Quoting Tvrtko Ursulin (2018-01-16 17:25:25)
> 
> On 16/01/2018 15:21, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-01-16 15:12:43)
> >>
> >> On 16/01/2018 13:05, Chris Wilson wrote:
> >>> When we finally decide the gpu is idle, that is a good time to shrink
> >>> our kmem_caches.
> >>>
> >>> v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
> >>> worker.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
> >>>    1 file changed, 30 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>> index 335731c93b4a..61b13fdfaa71 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>> @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
> >>>        }
> >>>    }
> >>>    
> >>> +static void shrink_caches(struct drm_i915_private *i915)
> >>> +{
> >>> +     /*
> >>> +      * kmem_cache_shrink() discards empty slabs and reorders partially
> >>> +      * filled slabs to prioritise allocating from the mostly full slabs,
> >>> +      * with the aim of reducing fragmentation.
> >>> +      */
> >>> +     kmem_cache_shrink(i915->priorities);
> >>> +     kmem_cache_shrink(i915->dependencies);
> >>> +     kmem_cache_shrink(i915->requests);
> >>> +     kmem_cache_shrink(i915->luts);
> >>> +     kmem_cache_shrink(i915->vmas);
> >>> +     kmem_cache_shrink(i915->objects);
> >>> +}
> >>> +
> >>>    static inline bool
> >>>    new_requests_since_last_retire(const struct drm_i915_private *i915)
> >>>    {
> >>> @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
> >>>                GEM_BUG_ON(!dev_priv->gt.awake);
> >>>                i915_queue_hangcheck(dev_priv);
> >>>        }
> >>> +
> >>> +     /*
> >>> +      * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
> >>> +      * returned to the system imediately but only after an RCU grace
> >>> +      * period. We want to encourage such pages to be returned and so
> >>> +      * incorporate a RCU barrier here to provide some rate limiting
> >>> +      * of the driver and flush the old pages before we free a new batch
> >>> +      * from the next round of shrinking.
> >>> +      */
> >>> +     rcu_barrier();
> >>
> >> Should this go into the conditional below? I don't think it makes a
> >> difference effectively, but may be more logical.
> > 
> > My thinking was to have the check after the sleep as the state is
> > subject to change. I'm not concerned about the random unnecessary pauses
> > on this wq, since it is subject to struct_mutex delays, so was quite
> 
> The delay doesn't worry me, but just that it is random - neither the 
> appearance of new requests, or completion of existing ones, has nothing 
> to do with one RCU grace period.
> 
> > happy to think of this as being "we shall only do one idle pass per RCU
> > grace period".
> 
> Idle worker is probably several orders of magnitude less frequent than 
> RCU grace periods so I don't think that can be a concern.
> 
> Hm..
> 
> >>> +
> >>> +     if (!new_requests_since_last_retire(dev_priv)) {
> >>> +             __i915_gem_free_work(&dev_priv->mm.free_work);
> 
> ... you wouldn't want to pull this up under the struct mutex section? It 
> would need a different flavour of a function to be called, and some 
> refactoring of the existing ones.

"Some". I don't think that improves anything?

The statement of intent to me is that we only throw away the caches and
excess memory if and only if we are idle. The presumption is that under
active conditions those caches are important, but if we are about to
sleep for long periods of time, we should be proactive in releasing
resources.

I can hear you about to ask if we could add a timer and wake up in 10s to
prove we were idle!
-Chris
Tvrtko Ursulin Jan. 17, 2018, 10:18 a.m. UTC | #6
On 16/01/2018 17:36, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-01-16 17:25:25)
>>
>> On 16/01/2018 15:21, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2018-01-16 15:12:43)
>>>>
>>>> On 16/01/2018 13:05, Chris Wilson wrote:
>>>>> When we finally decide the gpu is idle, that is a good time to shrink
>>>>> our kmem_caches.
>>>>>
>>>>> v2: Comment upon the random sprinkling of rcu_barrier() inside the idle
>>>>> worker.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/i915_gem.c | 30 ++++++++++++++++++++++++++++++
>>>>>     1 file changed, 30 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>>> index 335731c93b4a..61b13fdfaa71 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>>> @@ -4716,6 +4716,21 @@ i915_gem_retire_work_handler(struct work_struct *work)
>>>>>         }
>>>>>     }
>>>>>     
>>>>> +static void shrink_caches(struct drm_i915_private *i915)
>>>>> +{
>>>>> +     /*
>>>>> +      * kmem_cache_shrink() discards empty slabs and reorders partially
>>>>> +      * filled slabs to prioritise allocating from the mostly full slabs,
>>>>> +      * with the aim of reducing fragmentation.
>>>>> +      */
>>>>> +     kmem_cache_shrink(i915->priorities);
>>>>> +     kmem_cache_shrink(i915->dependencies);
>>>>> +     kmem_cache_shrink(i915->requests);
>>>>> +     kmem_cache_shrink(i915->luts);
>>>>> +     kmem_cache_shrink(i915->vmas);
>>>>> +     kmem_cache_shrink(i915->objects);
>>>>> +}
>>>>> +
>>>>>     static inline bool
>>>>>     new_requests_since_last_retire(const struct drm_i915_private *i915)
>>>>>     {
>>>>> @@ -4803,6 +4818,21 @@ i915_gem_idle_work_handler(struct work_struct *work)
>>>>>                 GEM_BUG_ON(!dev_priv->gt.awake);
>>>>>                 i915_queue_hangcheck(dev_priv);
>>>>>         }
>>>>> +
>>>>> +     /*
>>>>> +      * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
>>>>> +      * returned to the system imediately but only after an RCU grace
>>>>> +      * period. We want to encourage such pages to be returned and so
>>>>> +      * incorporate a RCU barrier here to provide some rate limiting
>>>>> +      * of the driver and flush the old pages before we free a new batch
>>>>> +      * from the next round of shrinking.
>>>>> +      */
>>>>> +     rcu_barrier();
>>>>
>>>> Should this go into the conditional below? I don't think it makes a
>>>> difference effectively, but may be more logical.
>>>
>>> My thinking was to have the check after the sleep as the state is
>>> subject to change. I'm not concerned about the random unnecessary pauses
>>> on this wq, since it is subject to struct_mutex delays, so was quite
>>
>> The delay doesn't worry me, but just that it is random - neither the
>> appearance of new requests, or completion of existing ones, has nothing
>> to do with one RCU grace period.
>>
>>> happy to think of this as being "we shall only do one idle pass per RCU
>>> grace period".
>>
>> Idle worker is probably several orders of magnitude less frequent than
>> RCU grace periods so I don't think that can be a concern.
>>
>> Hm..
>>
>>>>> +
>>>>> +     if (!new_requests_since_last_retire(dev_priv)) {
>>>>> +             __i915_gem_free_work(&dev_priv->mm.free_work);
>>
>> ... you wouldn't want to pull this up under the struct mutex section? It
>> would need a different flavour of a function to be called, and some
>> refactoring of the existing ones.
> 
> "Some". I don't think that improves anything?
> 
> The statement of intent to me is that we only throw away the caches and
> excess memory if and only if we are idle. The presumption is that under
> active conditions those caches are important, but if we are about to
> sleep for long periods of time, we should be proactive in releasing
> resources.
> 
> I can hear you about to ask if we could add a timer and wake up in 10s to
> prove we were idle!

No, pointless since this proposal already runs outside this guarantee, 
and anyway, this was or the other there is potential to disrupt the next 
client.

How about sticking in a break on new_request_since_last_retire() into 
__i915_gem_free_work()? Would that defeat the backlog cleaning? Maybe 
conditional only when called from the idle handler?

Regards,

Tvrtko
Chris Wilson Jan. 18, 2018, 6:06 p.m. UTC | #7
Quoting Tvrtko Ursulin (2018-01-17 10:18:38)
> 
> On 16/01/2018 17:36, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2018-01-16 17:25:25)
> >>
> >> On 16/01/2018 15:21, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2018-01-16 15:12:43)
> >>>>
> >>>> On 16/01/2018 13:05, Chris Wilson wrote:
> >>>>> +
> >>>>> +     if (!new_requests_since_last_retire(dev_priv)) {
> >>>>> +             __i915_gem_free_work(&dev_priv->mm.free_work);
> >>
> >> ... you wouldn't want to pull this up under the struct mutex section? It
> >> would need a different flavour of a function to be called, and some
> >> refactoring of the existing ones.
> > 
> > "Some". I don't think that improves anything?
> > 
> > The statement of intent to me is that we only throw away the caches and
> > excess memory if and only if we are idle. The presumption is that under
> > active conditions those caches are important, but if we are about to
> > sleep for long periods of time, we should be proactive in releasing
> > resources.
> > 
> > I can hear you about to ask if we could add a timer and wake up in 10s to
> > prove we were idle!
> 
> No, pointless since this proposal already runs outside this guarantee, 
> and anyway, this was or the other there is potential to disrupt the next 
> client.
> 
> How about sticking in a break on new_request_since_last_retire() into 
> __i915_gem_free_work()? Would that defeat the backlog cleaning? Maybe 
> conditional only when called from the idle handler?

__i915_gem_free_work() is a distraction, since it is just clearing the
backlog of freed objects and shouldn't be affecting the cache
optimisations for the next/concurrent client. Let me try rearranging the
flow here.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 335731c93b4a..61b13fdfaa71 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4716,6 +4716,21 @@  i915_gem_retire_work_handler(struct work_struct *work)
 	}
 }
 
+static void shrink_caches(struct drm_i915_private *i915)
+{
+	/*
+	 * kmem_cache_shrink() discards empty slabs and reorders partially
+	 * filled slabs to prioritise allocating from the mostly full slabs,
+	 * with the aim of reducing fragmentation.
+	 */
+	kmem_cache_shrink(i915->priorities);
+	kmem_cache_shrink(i915->dependencies);
+	kmem_cache_shrink(i915->requests);
+	kmem_cache_shrink(i915->luts);
+	kmem_cache_shrink(i915->vmas);
+	kmem_cache_shrink(i915->objects);
+}
+
 static inline bool
 new_requests_since_last_retire(const struct drm_i915_private *i915)
 {
@@ -4803,6 +4818,21 @@  i915_gem_idle_work_handler(struct work_struct *work)
 		GEM_BUG_ON(!dev_priv->gt.awake);
 		i915_queue_hangcheck(dev_priv);
 	}
+
+	/*
+	 * We use magical TYPESAFE_BY_RCU kmem_caches whose pages are not
+	 * returned to the system imediately but only after an RCU grace
+	 * period. We want to encourage such pages to be returned and so
+	 * incorporate a RCU barrier here to provide some rate limiting
+	 * of the driver and flush the old pages before we free a new batch
+	 * from the next round of shrinking.
+	 */
+	rcu_barrier();
+
+	if (!new_requests_since_last_retire(dev_priv)) {
+		__i915_gem_free_work(&dev_priv->mm.free_work);
+		shrink_caches(dev_priv);
+	}
 }
 
 int i915_gem_suspend(struct drm_i915_private *dev_priv)