diff mbox

drm/i915: Revert shrinker changes from "Track unbound pages"

Message ID 1357754229-26092-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Vetter Jan. 9, 2013, 5:57 p.m. UTC
This partially reverts

commit 6c085a728cf000ac1865d66f8c9b52935558b328
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 20 11:40:46 2012 +0200

    drm/i915: Track unbound pages

Closer inspection of that patch revealed a bunch of unrelated changes
in the shrinker:
- The shrinker count is now in pages instead of objects.
- For counting the shrinkable objects the old code only looked at the
  inactive list, the new code looks at all bounds objects (including
  pinned ones). That is obviously in addition to the new unbound list.
- The shrinker cound is no longer scaled with
  sysctl_vfs_cache_pressure.
- When actually shrinking objects, the old code first dropped
  purgeable objects, then normal (inactive) objects. Only then did it,
  in a last-ditch effort idle the gpu and evict everything. The new
  code omits the intermediate step of evicting normal inactive
  objects.

Safe for the first change, which seems benign, the endresult of all
these changes is that the shrinker is _much_ more likely to fall back
to the last-ditch resort of idling the gpu and evicting everything.
The old code could only do that if something else evicted lots of
objects meanwhile (since without any other changes the nr_to_scan will
be smaller than the object count).

Hence revert all these changes and restore the old shrinker behaviour,
with the minor adjustment that we now first scan the unbound list,
then the inactive list for each object category (purgeable or normal).

A similar patch has been tested by a few people affected by the gen4/5
hangs which started to appear in 3.7, which some people bisected to
the "drm/i915: Track unbound pages" commit. But just disabling the
unbound logic alone didn't change things at all.

References: https://bugs.freedesktop.org/show_bug.cgi?id=55984
References: https://bugs.freedesktop.org/show_bug.cgi?id=57122
References: https://bugs.freedesktop.org/show_bug.cgi?id=56916
References: https://bugs.freedesktop.org/show_bug.cgi?id=57136
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

Comments

Chris Wilson Jan. 10, 2013, 12:23 a.m. UTC | #1
On Wed,  9 Jan 2013 18:57:09 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> This partially reverts
> 
> commit 6c085a728cf000ac1865d66f8c9b52935558b328
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Mon Aug 20 11:40:46 2012 +0200
> 
>     drm/i915: Track unbound pages
> 
> Closer inspection of that patch revealed a bunch of unrelated changes
> in the shrinker:
> - The shrinker count is now in pages instead of objects.
> - For counting the shrinkable objects the old code only looked at the
>   inactive list, the new code looks at all bounds objects (including
>   pinned ones). That is obviously in addition to the new unbound list.
> - The shrinker cound is no longer scaled with
>   sysctl_vfs_cache_pressure.

I made a mistake and copied the wrong code in my original
implementation:

vfs_cache_pressure
------------------

Controls the tendency of the kernel to reclaim the memory which is used for
caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

--------------------

It's not an inode or a directory and our pages directly akin to the page
and buffer caches, so I should have never used vfs_cache_pressure and
you have not justified why you are adding it back.

> - When actually shrinking objects, the old code first dropped
>   purgeable objects, then normal (inactive) objects. Only then did it,
>   in a last-ditch effort idle the gpu and evict everything. The new
>   code omits the intermediate step of evicting normal inactive
>   objects.

This is the crux of the papering, so just do this and call it what it is.
-Chris
Daniel Vetter Jan. 10, 2013, 9:34 a.m. UTC | #2
On Thu, Jan 10, 2013 at 1:23 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed,  9 Jan 2013 18:57:09 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> This partially reverts
>>
>> commit 6c085a728cf000ac1865d66f8c9b52935558b328
>> Author: Chris Wilson <chris@chris-wilson.co.uk>
>> Date:   Mon Aug 20 11:40:46 2012 +0200
>>
>>     drm/i915: Track unbound pages
>>
>> Closer inspection of that patch revealed a bunch of unrelated changes
>> in the shrinker:
>> - The shrinker count is now in pages instead of objects.
>> - For counting the shrinkable objects the old code only looked at the
>>   inactive list, the new code looks at all bounds objects (including
>>   pinned ones). That is obviously in addition to the new unbound list.
>> - The shrinker cound is no longer scaled with
>>   sysctl_vfs_cache_pressure.
>
> I made a mistake and copied the wrong code in my original
> implementation:
>
> vfs_cache_pressure
> ------------------
>
> Controls the tendency of the kernel to reclaim the memory which is used for
> caching of directory and inode objects.
>
> At the default value of vfs_cache_pressure=100 the kernel will attempt to
> reclaim dentries and inodes at a "fair" rate with respect to pagecache and
> swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
> to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
> never reclaim dentries and inodes due to memory pressure and this can easily
> lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
> causes the kernel to prefer to reclaim dentries and inodes.
>
> --------------------
>
> It's not an inode or a directory and our pages directly akin to the page
> and buffer caches, so I should have never used vfs_cache_pressure and
> you have not justified why you are adding it back.

Yeah, I agree that the vfs_cache_pressure tuning is rather bogus, but
I'd prefer to revert all the shrinker related changes for now. We can
remedy this in next with a quick patch easily - burned child applies
;-)

>> - When actually shrinking objects, the old code first dropped
>>   purgeable objects, then normal (inactive) objects. Only then did it,
>>   in a last-ditch effort idle the gpu and evict everything. The new
>>   code omits the intermediate step of evicting normal inactive
>>   objects.
>
> This is the crux of the papering, so just do this and call it what it is.

Agreed, I'll clarify the commit message a bit and stress that the
referenced bugs aren't really fixed, but at least the regressions
should be resolved.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 51df4ee..7c42788 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1714,7 +1714,8 @@  i915_gem_object_put_pages(struct drm_i915_gem_object *obj)
 }
 
 static long
-i915_gem_purge(struct drm_i915_private *dev_priv, long target)
+__i915_gem_shrink(struct drm_i915_private *dev_priv, long target,
+		  bool purgeable_only)
 {
 	struct drm_i915_gem_object *obj, *next;
 	long count = 0;
@@ -1722,7 +1723,7 @@  i915_gem_purge(struct drm_i915_private *dev_priv, long target)
 	list_for_each_entry_safe(obj, next,
 				 &dev_priv->mm.unbound_list,
 				 gtt_list) {
-		if (i915_gem_object_is_purgeable(obj) &&
+		if ((i915_gem_object_is_purgeable(obj) || !purgeable_only) &&
 		    i915_gem_object_put_pages(obj) == 0) {
 			count += obj->base.size >> PAGE_SHIFT;
 			if (count >= target)
@@ -1733,7 +1734,7 @@  i915_gem_purge(struct drm_i915_private *dev_priv, long target)
 	list_for_each_entry_safe(obj, next,
 				 &dev_priv->mm.inactive_list,
 				 mm_list) {
-		if (i915_gem_object_is_purgeable(obj) &&
+		if ((i915_gem_object_is_purgeable(obj) || !purgeable_only) &&
 		    i915_gem_object_unbind(obj) == 0 &&
 		    i915_gem_object_put_pages(obj) == 0) {
 			count += obj->base.size >> PAGE_SHIFT;
@@ -1745,6 +1746,12 @@  i915_gem_purge(struct drm_i915_private *dev_priv, long target)
 	return count;
 }
 
+static long
+i915_gem_purge(struct drm_i915_private *dev_priv, long target)
+{
+	return __i915_gem_shrink(dev_priv, target, true);
+}
+
 static void
 i915_gem_shrink_all(struct drm_i915_private *dev_priv)
 {
@@ -4415,6 +4422,9 @@  i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc)
 	if (nr_to_scan) {
 		nr_to_scan -= i915_gem_purge(dev_priv, nr_to_scan);
 		if (nr_to_scan > 0)
+			nr_to_scan -= __i915_gem_shrink(dev_priv, nr_to_scan,
+							false);
+		if (nr_to_scan > 0)
 			i915_gem_shrink_all(dev_priv);
 	}
 
@@ -4422,11 +4432,11 @@  i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc)
 	list_for_each_entry(obj, &dev_priv->mm.unbound_list, gtt_list)
 		if (obj->pages_pin_count == 0)
 			cnt += obj->base.size >> PAGE_SHIFT;
-	list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list)
+	list_for_each_entry(obj, &dev_priv->mm.inactive_list, gtt_list)
 		if (obj->pin_count == 0 && obj->pages_pin_count == 0)
 			cnt += obj->base.size >> PAGE_SHIFT;
 
 	if (unlock)
 		mutex_unlock(&dev->struct_mutex);
-	return cnt;
+	return cnt / 100 * sysctl_vfs_cache_pressure;
 }