Message ID | 1357754229-26092-1-git-send-email-daniel.vetter@ffwll.ch (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 9 Jan 2013 18:57:09 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > This partially reverts > > commit 6c085a728cf000ac1865d66f8c9b52935558b328 > Author: Chris Wilson <chris@chris-wilson.co.uk> > Date: Mon Aug 20 11:40:46 2012 +0200 > > drm/i915: Track unbound pages > > Closer inspection of that patch revealed a bunch of unrelated changes > in the shrinker: > - The shrinker count is now in pages instead of objects. > - For counting the shrinkable objects the old code only looked at the > inactive list, the new code looks at all bounds objects (including > pinned ones). That is obviously in addition to the new unbound list. > - The shrinker cound is no longer scaled with > sysctl_vfs_cache_pressure. I made a mistake and copied the wrong code in my original implementation: vfs_cache_pressure ------------------ Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of vfs_cache_pressure=100 the kernel will attempt to reclaim dentries and inodes at a "fair" rate with respect to pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will never reclaim dentries and inodes due to memory pressure and this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 causes the kernel to prefer to reclaim dentries and inodes. -------------------- It's not an inode or a directory and our pages directly akin to the page and buffer caches, so I should have never used vfs_cache_pressure and you have not justified why you are adding it back. > - When actually shrinking objects, the old code first dropped > purgeable objects, then normal (inactive) objects. Only then did it, > in a last-ditch effort idle the gpu and evict everything. The new > code omits the intermediate step of evicting normal inactive > objects. This is the crux of the papering, so just do this and call it what it is. -Chris
On Thu, Jan 10, 2013 at 1:23 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote: > On Wed, 9 Jan 2013 18:57:09 +0100, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> This partially reverts >> >> commit 6c085a728cf000ac1865d66f8c9b52935558b328 >> Author: Chris Wilson <chris@chris-wilson.co.uk> >> Date: Mon Aug 20 11:40:46 2012 +0200 >> >> drm/i915: Track unbound pages >> >> Closer inspection of that patch revealed a bunch of unrelated changes >> in the shrinker: >> - The shrinker count is now in pages instead of objects. >> - For counting the shrinkable objects the old code only looked at the >> inactive list, the new code looks at all bounds objects (including >> pinned ones). That is obviously in addition to the new unbound list. >> - The shrinker cound is no longer scaled with >> sysctl_vfs_cache_pressure. > > I made a mistake and copied the wrong code in my original > implementation: > > vfs_cache_pressure > ------------------ > > Controls the tendency of the kernel to reclaim the memory which is used for > caching of directory and inode objects. > > At the default value of vfs_cache_pressure=100 the kernel will attempt to > reclaim dentries and inodes at a "fair" rate with respect to pagecache and > swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer > to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will > never reclaim dentries and inodes due to memory pressure and this can easily > lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 > causes the kernel to prefer to reclaim dentries and inodes. > > -------------------- > > It's not an inode or a directory and our pages directly akin to the page > and buffer caches, so I should have never used vfs_cache_pressure and > you have not justified why you are adding it back. Yeah, I agree that the vfs_cache_pressure tuning is rather bogus, but I'd prefer to revert all the shrinker related changes for now. We can remedy this in next with a quick patch easily - burned child applies ;-) >> - When actually shrinking objects, the old code first dropped >> purgeable objects, then normal (inactive) objects. Only then did it, >> in a last-ditch effort idle the gpu and evict everything. The new >> code omits the intermediate step of evicting normal inactive >> objects. > > This is the crux of the papering, so just do this and call it what it is. Agreed, I'll clarify the commit message a bit and stress that the referenced bugs aren't really fixed, but at least the regressions should be resolved. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 51df4ee..7c42788 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1714,7 +1714,8 @@ i915_gem_object_put_pages(struct drm_i915_gem_object *obj) } static long -i915_gem_purge(struct drm_i915_private *dev_priv, long target) +__i915_gem_shrink(struct drm_i915_private *dev_priv, long target, + bool purgeable_only) { struct drm_i915_gem_object *obj, *next; long count = 0; @@ -1722,7 +1723,7 @@ i915_gem_purge(struct drm_i915_private *dev_priv, long target) list_for_each_entry_safe(obj, next, &dev_priv->mm.unbound_list, gtt_list) { - if (i915_gem_object_is_purgeable(obj) && + if ((i915_gem_object_is_purgeable(obj) || !purgeable_only) && i915_gem_object_put_pages(obj) == 0) { count += obj->base.size >> PAGE_SHIFT; if (count >= target) @@ -1733,7 +1734,7 @@ i915_gem_purge(struct drm_i915_private *dev_priv, long target) list_for_each_entry_safe(obj, next, &dev_priv->mm.inactive_list, mm_list) { - if (i915_gem_object_is_purgeable(obj) && + if ((i915_gem_object_is_purgeable(obj) || !purgeable_only) && i915_gem_object_unbind(obj) == 0 && i915_gem_object_put_pages(obj) == 0) { count += obj->base.size >> PAGE_SHIFT; @@ -1745,6 +1746,12 @@ i915_gem_purge(struct drm_i915_private *dev_priv, long target) return count; } +static long +i915_gem_purge(struct drm_i915_private *dev_priv, long target) +{ + return __i915_gem_shrink(dev_priv, target, true); +} + static void i915_gem_shrink_all(struct drm_i915_private *dev_priv) { @@ -4415,6 +4422,9 @@ i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc) if (nr_to_scan) { nr_to_scan -= i915_gem_purge(dev_priv, nr_to_scan); if (nr_to_scan > 0) + nr_to_scan -= __i915_gem_shrink(dev_priv, nr_to_scan, + false); + if (nr_to_scan > 0) i915_gem_shrink_all(dev_priv); } @@ -4422,11 +4432,11 @@ i915_gem_inactive_shrink(struct shrinker *shrinker, struct shrink_control *sc) list_for_each_entry(obj, &dev_priv->mm.unbound_list, gtt_list) if (obj->pages_pin_count == 0) cnt += obj->base.size >> PAGE_SHIFT; - list_for_each_entry(obj, &dev_priv->mm.bound_list, gtt_list) + list_for_each_entry(obj, &dev_priv->mm.inactive_list, gtt_list) if (obj->pin_count == 0 && obj->pages_pin_count == 0) cnt += obj->base.size >> PAGE_SHIFT; if (unlock) mutex_unlock(&dev->struct_mutex); - return cnt; + return cnt / 100 * sysctl_vfs_cache_pressure; }
This partially reverts commit 6c085a728cf000ac1865d66f8c9b52935558b328 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Aug 20 11:40:46 2012 +0200 drm/i915: Track unbound pages Closer inspection of that patch revealed a bunch of unrelated changes in the shrinker: - The shrinker count is now in pages instead of objects. - For counting the shrinkable objects the old code only looked at the inactive list, the new code looks at all bounds objects (including pinned ones). That is obviously in addition to the new unbound list. - The shrinker cound is no longer scaled with sysctl_vfs_cache_pressure. - When actually shrinking objects, the old code first dropped purgeable objects, then normal (inactive) objects. Only then did it, in a last-ditch effort idle the gpu and evict everything. The new code omits the intermediate step of evicting normal inactive objects. Safe for the first change, which seems benign, the endresult of all these changes is that the shrinker is _much_ more likely to fall back to the last-ditch resort of idling the gpu and evicting everything. The old code could only do that if something else evicted lots of objects meanwhile (since without any other changes the nr_to_scan will be smaller than the object count). Hence revert all these changes and restore the old shrinker behaviour, with the minor adjustment that we now first scan the unbound list, then the inactive list for each object category (purgeable or normal). A similar patch has been tested by a few people affected by the gen4/5 hangs which started to appear in 3.7, which some people bisected to the "drm/i915: Track unbound pages" commit. But just disabling the unbound logic alone didn't change things at all. References: https://bugs.freedesktop.org/show_bug.cgi?id=55984 References: https://bugs.freedesktop.org/show_bug.cgi?id=57122 References: https://bugs.freedesktop.org/show_bug.cgi?id=56916 References: https://bugs.freedesktop.org/show_bug.cgi?id=57136 Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> --- drivers/gpu/drm/i915/i915_gem.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)