diff mbox

[4/4,v4] drm/i915: Convert execbuf code to use vmas

Message ID 1376442549-5087-4-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ben Widawsky Aug. 14, 2013, 1:09 a.m. UTC
From: Ben Widawsky <ben@bwidawsk.net>

In order to transition more of our code over to using a VMA instead of
an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
until now, we've only had a VMA when actually binding an object.

The previous patch helped handle the distinction on bound vs. unbound.
This patch will help us catch leaks, and other issues before we actually
shuffle a bunch of stuff around.

This attempts to convert all the execbuf code to speak in vmas. Since
the execbuf code is very self contained it was a nice isolated
conversion.

The meat of the code is about turning eb_objects into eb_vma, and then
wiring up the rest of the code to use vmas instead of obj, vm pairs.

Unfortunately, to do this, we must move the exec_list link from the obj
structure. This list is reused in the eviction code, so we must also
modify the eviction code to make this work.

WARNING: This patch makes an already hotly profiled path slower. The cost is
unavoidable. In reply to this mail, I will attach the extra data.

v2: Release table lock early, and two a 2 phase vma lookup to avoid
having to use a GFP_ATOMIC. (Chris)

v3: s/obj_exec_list/obj_exec_link/
Updates to address
commit 6d2b888569d366beb4be72cacfde41adee2c25e1
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 7 18:30:54 2013 +0100

    drm/i915: List objects allocated from stolen memory in debugfs

v4: Use obj = vma->obj for neatness in some places (Chris)
need_reloc_mappable() should return false if ppgtt (Chris)

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_debugfs.c        |  12 +-
 drivers/gpu/drm/i915/i915_drv.h            |  25 ++-
 drivers/gpu/drm/i915/i915_gem.c            |  28 ++-
 drivers/gpu/drm/i915/i915_gem_evict.c      |  31 ++-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 319 ++++++++++++++++-------------
 5 files changed, 232 insertions(+), 183 deletions(-)

Comments

Ben Widawsky Aug. 14, 2013, 1:11 a.m. UTC | #1
On Tue, Aug 13, 2013 at 06:09:09PM -0700, Ben Widawsky wrote:
> From: Ben Widawsky <ben@bwidawsk.net>
> 
> In order to transition more of our code over to using a VMA instead of
> an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
> until now, we've only had a VMA when actually binding an object.
> 
> The previous patch helped handle the distinction on bound vs. unbound.
> This patch will help us catch leaks, and other issues before we actually
> shuffle a bunch of stuff around.
> 
> This attempts to convert all the execbuf code to speak in vmas. Since
> the execbuf code is very self contained it was a nice isolated
> conversion.
> 
> The meat of the code is about turning eb_objects into eb_vma, and then
> wiring up the rest of the code to use vmas instead of obj, vm pairs.
> 
> Unfortunately, to do this, we must move the exec_list link from the obj
> structure. This list is reused in the eviction code, so we must also
> modify the eviction code to make this work.
> 
> WARNING: This patch makes an already hotly profiled path slower. The cost is
> unavoidable. In reply to this mail, I will attach the extra data.
> 

[snip]

Here is the output from gem_exec_lut_handle both before and after this
patch. The results honestly don't make sense to me, but I'll set Chris
parse it before scratching my head harder.

Before patch
============
relocation: buffers=   1: old=   8060 + 165.3*reloc, lut=   7816 + 164.8*reloc (ns)
relocation: buffers=   2: old=   6748 + 166.6*reloc, lut=   6952 + 165.4*reloc (ns)
relocation: buffers=   4: old=   8140 + 165.9*reloc, lut=   8216 + 165.4*reloc (ns)
relocation: buffers=   8: old=  10732 + 166.0*reloc, lut=  10615 + 165.2*reloc (ns)
relocation: buffers=  16: old=  15099 + 167.8*reloc, lut=  15337 + 165.3*reloc (ns)
relocation: buffers=  32: old=  26140 + 166.0*reloc, lut=  25488 + 165.5*reloc (ns)
relocation: buffers=  64: old=  46300 + 170.5*reloc, lut=  44279 + 166.7*reloc (ns)
relocation: buffers= 128: old=  84056 + 176.9*reloc, lut=  85379 + 166.3*reloc (ns)
relocation: buffers= 256: old= 174398 + 167.9*reloc, lut= 167744 + 167.0*reloc (ns)
relocation: buffers= 512: old= 349688 + 175.7*reloc, lut= 348590 + 170.8*reloc (ns)
relocation: buffers=1024: old= 726265 + 191.2*reloc, lut= 719774 + 180.2*reloc (ns)
relocation: buffers=2048: old=1456866 + 224.3*reloc, lut=1442087 + 173.0*reloc (ns)
skip-relocs: buffers=   1: old=   4445 + 16.0*reloc, lut=   4433 + 15.6*reloc (ns)
skip-relocs: buffers=   2: old=   4585 + 16.0*reloc, lut=   4571 + 15.6*reloc (ns)
skip-relocs: buffers=   4: old=   5667 + 16.0*reloc, lut=   5340 + 15.6*reloc (ns)
skip-relocs: buffers=   8: old=   6051 + 16.1*reloc, lut=   6026 + 15.6*reloc (ns)
skip-relocs: buffers=  16: old=   7953 + 16.1*reloc, lut=   7914 + 15.6*reloc (ns)
skip-relocs: buffers=  32: old=  11972 + 16.2*reloc, lut=  11875 + 15.7*reloc (ns)
skip-relocs: buffers=  64: old=  19999 + 16.5*reloc, lut=  19832 + 15.7*reloc (ns)
skip-relocs: buffers= 128: old=  37796 + 16.9*reloc, lut=  36539 + 15.9*reloc (ns)
skip-relocs: buffers= 256: old=  71604 + 18.1*reloc, lut=  71313 + 16.5*reloc (ns)
skip-relocs: buffers= 512: old= 152682 + 24.3*reloc, lut= 141379 + 27.9*reloc (ns)
skip-relocs: buffers=1024: old= 314116 + 41.7*reloc, lut= 303019 + 20.1*reloc (ns)
skip-relocs: buffers=2048: old= 619784 + 54.1*reloc, lut= 603931 + 20.0*reloc (ns)
no-relocs: buffers=   1: old=   4194 + 5.1*reloc, lut=   4206 + 4.8*reloc (ns)
no-relocs: buffers=   2: old=   4404 + 5.1*reloc, lut=   4381 + 4.8*reloc (ns)
no-relocs: buffers=   4: old=   4926 + 5.1*reloc, lut=   4921 + 4.8*reloc (ns)
no-relocs: buffers=   8: old=   5901 + 5.1*reloc, lut=   5822 + 4.9*reloc (ns)
no-relocs: buffers=  16: old=   7840 + 5.1*reloc, lut=   7737 + 4.9*reloc (ns)
no-relocs: buffers=  32: old=  11842 + 5.1*reloc, lut=  11681 + 4.9*reloc (ns)
no-relocs: buffers=  64: old=  19741 + 5.1*reloc, lut=  19542 + 4.8*reloc (ns)
no-relocs: buffers= 128: old=  36479 + 5.2*reloc, lut=  35958 + 4.9*reloc (ns)
no-relocs: buffers= 256: old=  70171 + 5.4*reloc, lut=  69390 + 5.2*reloc (ns)
no-relocs: buffers= 512: old= 147213 + 3.5*reloc, lut= 137953 + 13.0*reloc (ns)
no-relocs: buffers=1024: old= 300165 + 4.8*reloc, lut= 293852 + 4.9*reloc (ns)
no-relocs: buffers=2048: old= 597992 + 8.3*reloc, lut= 590185 + 2.1*reloc (ns)


After patch
===========
relocation: buffers=   1: old=   8075 + 81.4*reloc, lut=   7592 + 80.6*reloc (ns)
relocation: buffers=   2: old=   5744 + 82.3*reloc, lut=   5837 + 81.1*reloc (ns)
relocation: buffers=   4: old=   4875 + 82.7*reloc, lut=   4871 + 81.6*reloc (ns)
relocation: buffers=   8: old=   5729 + 82.7*reloc, lut=   5698 + 81.5*reloc (ns)
relocation: buffers=  16: old=   7952 + 83.0*reloc, lut=   7809 + 81.9*reloc (ns)
relocation: buffers=  32: old=  11884 + 82.9*reloc, lut=  11702 + 81.6*reloc (ns)
relocation: buffers=  64: old=  20388 + 83.4*reloc, lut=  19995 + 82.2*reloc (ns)
relocation: buffers= 128: old=  38057 + 85.0*reloc, lut=  37675 + 83.4*reloc (ns)
relocation: buffers= 256: old=  74912 + 87.0*reloc, lut=  74064 + 85.4*reloc (ns)
relocation: buffers= 512: old= 161136 + 94.8*reloc, lut= 157046 + 87.5*reloc (ns)
relocation: buffers=1024: old= 349443 + 107.0*reloc, lut= 342081 + 91.2*reloc (ns)
relocation: buffers=2048: old= 707951 + 131.8*reloc, lut= 690754 + 96.9*reloc (ns)
skip-relocs: buffers=   1: old=   2966 + 16.6*reloc, lut=   2963 + 15.6*reloc (ns)
skip-relocs: buffers=   2: old=   3083 + 16.5*reloc, lut=   3056 + 15.5*reloc (ns)
skip-relocs: buffers=   4: old=   3279 + 16.6*reloc, lut=   3242 + 15.6*reloc (ns)
skip-relocs: buffers=   8: old=   3692 + 16.7*reloc, lut=   3654 + 15.6*reloc (ns)
skip-relocs: buffers=  16: old=   4522 + 16.7*reloc, lut=   4461 + 15.5*reloc (ns)
skip-relocs: buffers=  32: old=   6254 + 16.7*reloc, lut=   6138 + 15.7*reloc (ns)
skip-relocs: buffers=  64: old=  10098 + 16.8*reloc, lut=   9939 + 15.7*reloc (ns)
skip-relocs: buffers= 128: old=  17983 + 17.6*reloc, lut=  17729 + 16.3*reloc (ns)
skip-relocs: buffers= 256: old=  34388 + 18.8*reloc, lut=  33981 + 17.6*reloc (ns)
skip-relocs: buffers= 512: old=  74211 + 25.2*reloc, lut=  72185 + 18.6*reloc (ns)
skip-relocs: buffers=1024: old= 160514 + 34.1*reloc, lut= 157086 + 20.3*reloc (ns)
skip-relocs: buffers=2048: old= 323954 + 51.5*reloc, lut= 315928 + 22.5*reloc (ns)
no-relocs: buffers=   1: old=   2840 + 5.1*reloc, lut=   2834 + 4.8*reloc (ns)
no-relocs: buffers=   2: old=   2938 + 5.1*reloc, lut=   2917 + 4.8*reloc (ns)
no-relocs: buffers=   4: old=   3220 + 5.1*reloc, lut=   3201 + 4.8*reloc (ns)
no-relocs: buffers=   8: old=   3614 + 5.1*reloc, lut=   3545 + 4.8*reloc (ns)
no-relocs: buffers=  16: old=   4437 + 5.1*reloc, lut=   4368 + 4.8*reloc (ns)
no-relocs: buffers=  32: old=   6105 + 5.1*reloc, lut=   6024 + 4.9*reloc (ns)
no-relocs: buffers=  64: old=   9864 + 5.1*reloc, lut=   9652 + 4.9*reloc (ns)
no-relocs: buffers= 128: old=  17388 + 5.1*reloc, lut=  17126 + 4.9*reloc (ns)
no-relocs: buffers= 256: old=  33087 + 5.4*reloc, lut=  32668 + 5.3*reloc (ns)
no-relocs: buffers= 512: old=  71476 + 5.0*reloc, lut=  69464 + 4.9*reloc (ns)
no-relocs: buffers=1024: old= 154379 + 4.9*reloc, lut= 152796 + 4.3*reloc (ns)
no-relocs: buffers=2048: old= 309435 + 5.0*reloc, lut= 301095 + 4.9*reloc (ns)
Chris Wilson Aug. 14, 2013, 7:58 a.m. UTC | #2
On Tue, Aug 13, 2013 at 06:11:59PM -0700, Ben Widawsky wrote:
> On Tue, Aug 13, 2013 at 06:09:09PM -0700, Ben Widawsky wrote:
> > From: Ben Widawsky <ben@bwidawsk.net>
> > 
> > In order to transition more of our code over to using a VMA instead of
> > an <OBJ, VM> pair - we must have the vma accessible at execbuf time. Up
> > until now, we've only had a VMA when actually binding an object.
> > 
> > The previous patch helped handle the distinction on bound vs. unbound.
> > This patch will help us catch leaks, and other issues before we actually
> > shuffle a bunch of stuff around.
> > 
> > This attempts to convert all the execbuf code to speak in vmas. Since
> > the execbuf code is very self contained it was a nice isolated
> > conversion.
> > 
> > The meat of the code is about turning eb_objects into eb_vma, and then
> > wiring up the rest of the code to use vmas instead of obj, vm pairs.
> > 
> > Unfortunately, to do this, we must move the exec_list link from the obj
> > structure. This list is reused in the eviction code, so we must also
> > modify the eviction code to make this work.
> > 
> > WARNING: This patch makes an already hotly profiled path slower. The cost is
> > unavoidable. In reply to this mail, I will attach the extra data.
> > 
> 
> [snip]
> 
> Here is the output from gem_exec_lut_handle both before and after this
> patch. The results honestly don't make sense to me, but I'll set Chris
> parse it before scratching my head harder.
> 
> Before patch
> ============
> relocation: buffers=   1: old=   8060 + 165.3*reloc, lut=   7816 + 164.8*reloc (ns)
> relocation: buffers=   2: old=   6748 + 166.6*reloc, lut=   6952 + 165.4*reloc (ns)
> relocation: buffers=   4: old=   8140 + 165.9*reloc, lut=   8216 + 165.4*reloc (ns)
> relocation: buffers=   8: old=  10732 + 166.0*reloc, lut=  10615 + 165.2*reloc (ns)
> relocation: buffers=  16: old=  15099 + 167.8*reloc, lut=  15337 + 165.3*reloc (ns)
> relocation: buffers=  32: old=  26140 + 166.0*reloc, lut=  25488 + 165.5*reloc (ns)
> relocation: buffers=  64: old=  46300 + 170.5*reloc, lut=  44279 + 166.7*reloc (ns)
> relocation: buffers= 128: old=  84056 + 176.9*reloc, lut=  85379 + 166.3*reloc (ns)
> relocation: buffers= 256: old= 174398 + 167.9*reloc, lut= 167744 + 167.0*reloc (ns)
> relocation: buffers= 512: old= 349688 + 175.7*reloc, lut= 348590 + 170.8*reloc (ns)
> relocation: buffers=1024: old= 726265 + 191.2*reloc, lut= 719774 + 180.2*reloc (ns)
> relocation: buffers=2048: old=1456866 + 224.3*reloc, lut=1442087 + 173.0*reloc (ns)
> skip-relocs: buffers=   1: old=   4445 + 16.0*reloc, lut=   4433 + 15.6*reloc (ns)
> skip-relocs: buffers=   2: old=   4585 + 16.0*reloc, lut=   4571 + 15.6*reloc (ns)
> skip-relocs: buffers=   4: old=   5667 + 16.0*reloc, lut=   5340 + 15.6*reloc (ns)
> skip-relocs: buffers=   8: old=   6051 + 16.1*reloc, lut=   6026 + 15.6*reloc (ns)
> skip-relocs: buffers=  16: old=   7953 + 16.1*reloc, lut=   7914 + 15.6*reloc (ns)
> skip-relocs: buffers=  32: old=  11972 + 16.2*reloc, lut=  11875 + 15.7*reloc (ns)
> skip-relocs: buffers=  64: old=  19999 + 16.5*reloc, lut=  19832 + 15.7*reloc (ns)
> skip-relocs: buffers= 128: old=  37796 + 16.9*reloc, lut=  36539 + 15.9*reloc (ns)
> skip-relocs: buffers= 256: old=  71604 + 18.1*reloc, lut=  71313 + 16.5*reloc (ns)
> skip-relocs: buffers= 512: old= 152682 + 24.3*reloc, lut= 141379 + 27.9*reloc (ns)
> skip-relocs: buffers=1024: old= 314116 + 41.7*reloc, lut= 303019 + 20.1*reloc (ns)
> skip-relocs: buffers=2048: old= 619784 + 54.1*reloc, lut= 603931 + 20.0*reloc (ns)
> no-relocs: buffers=   1: old=   4194 + 5.1*reloc, lut=   4206 + 4.8*reloc (ns)
> no-relocs: buffers=   2: old=   4404 + 5.1*reloc, lut=   4381 + 4.8*reloc (ns)
> no-relocs: buffers=   4: old=   4926 + 5.1*reloc, lut=   4921 + 4.8*reloc (ns)
> no-relocs: buffers=   8: old=   5901 + 5.1*reloc, lut=   5822 + 4.9*reloc (ns)
> no-relocs: buffers=  16: old=   7840 + 5.1*reloc, lut=   7737 + 4.9*reloc (ns)
> no-relocs: buffers=  32: old=  11842 + 5.1*reloc, lut=  11681 + 4.9*reloc (ns)
> no-relocs: buffers=  64: old=  19741 + 5.1*reloc, lut=  19542 + 4.8*reloc (ns)
> no-relocs: buffers= 128: old=  36479 + 5.2*reloc, lut=  35958 + 4.9*reloc (ns)
> no-relocs: buffers= 256: old=  70171 + 5.4*reloc, lut=  69390 + 5.2*reloc (ns)
> no-relocs: buffers= 512: old= 147213 + 3.5*reloc, lut= 137953 + 13.0*reloc (ns)
> no-relocs: buffers=1024: old= 300165 + 4.8*reloc, lut= 293852 + 4.9*reloc (ns)
> no-relocs: buffers=2048: old= 597992 + 8.3*reloc, lut= 590185 + 2.1*reloc (ns)
> 
> 
> After patch
> ===========
> relocation: buffers=   1: old=   8075 + 81.4*reloc, lut=   7592 + 80.6*reloc (ns)
> relocation: buffers=   2: old=   5744 + 82.3*reloc, lut=   5837 + 81.1*reloc (ns)
> relocation: buffers=   4: old=   4875 + 82.7*reloc, lut=   4871 + 81.6*reloc (ns)
> relocation: buffers=   8: old=   5729 + 82.7*reloc, lut=   5698 + 81.5*reloc (ns)
> relocation: buffers=  16: old=   7952 + 83.0*reloc, lut=   7809 + 81.9*reloc (ns)
> relocation: buffers=  32: old=  11884 + 82.9*reloc, lut=  11702 + 81.6*reloc (ns)
> relocation: buffers=  64: old=  20388 + 83.4*reloc, lut=  19995 + 82.2*reloc (ns)
> relocation: buffers= 128: old=  38057 + 85.0*reloc, lut=  37675 + 83.4*reloc (ns)
> relocation: buffers= 256: old=  74912 + 87.0*reloc, lut=  74064 + 85.4*reloc (ns)
> relocation: buffers= 512: old= 161136 + 94.8*reloc, lut= 157046 + 87.5*reloc (ns)
> relocation: buffers=1024: old= 349443 + 107.0*reloc, lut= 342081 + 91.2*reloc (ns)
> relocation: buffers=2048: old= 707951 + 131.8*reloc, lut= 690754 + 96.9*reloc (ns)
> skip-relocs: buffers=   1: old=   2966 + 16.6*reloc, lut=   2963 + 15.6*reloc (ns)
> skip-relocs: buffers=   2: old=   3083 + 16.5*reloc, lut=   3056 + 15.5*reloc (ns)
> skip-relocs: buffers=   4: old=   3279 + 16.6*reloc, lut=   3242 + 15.6*reloc (ns)
> skip-relocs: buffers=   8: old=   3692 + 16.7*reloc, lut=   3654 + 15.6*reloc (ns)
> skip-relocs: buffers=  16: old=   4522 + 16.7*reloc, lut=   4461 + 15.5*reloc (ns)
> skip-relocs: buffers=  32: old=   6254 + 16.7*reloc, lut=   6138 + 15.7*reloc (ns)
> skip-relocs: buffers=  64: old=  10098 + 16.8*reloc, lut=   9939 + 15.7*reloc (ns)
> skip-relocs: buffers= 128: old=  17983 + 17.6*reloc, lut=  17729 + 16.3*reloc (ns)
> skip-relocs: buffers= 256: old=  34388 + 18.8*reloc, lut=  33981 + 17.6*reloc (ns)
> skip-relocs: buffers= 512: old=  74211 + 25.2*reloc, lut=  72185 + 18.6*reloc (ns)
> skip-relocs: buffers=1024: old= 160514 + 34.1*reloc, lut= 157086 + 20.3*reloc (ns)
> skip-relocs: buffers=2048: old= 323954 + 51.5*reloc, lut= 315928 + 22.5*reloc (ns)
> no-relocs: buffers=   1: old=   2840 + 5.1*reloc, lut=   2834 + 4.8*reloc (ns)
> no-relocs: buffers=   2: old=   2938 + 5.1*reloc, lut=   2917 + 4.8*reloc (ns)
> no-relocs: buffers=   4: old=   3220 + 5.1*reloc, lut=   3201 + 4.8*reloc (ns)
> no-relocs: buffers=   8: old=   3614 + 5.1*reloc, lut=   3545 + 4.8*reloc (ns)
> no-relocs: buffers=  16: old=   4437 + 5.1*reloc, lut=   4368 + 4.8*reloc (ns)
> no-relocs: buffers=  32: old=   6105 + 5.1*reloc, lut=   6024 + 4.9*reloc (ns)
> no-relocs: buffers=  64: old=   9864 + 5.1*reloc, lut=   9652 + 4.9*reloc (ns)
> no-relocs: buffers= 128: old=  17388 + 5.1*reloc, lut=  17126 + 4.9*reloc (ns)
> no-relocs: buffers= 256: old=  33087 + 5.4*reloc, lut=  32668 + 5.3*reloc (ns)
> no-relocs: buffers= 512: old=  71476 + 5.0*reloc, lut=  69464 + 4.9*reloc (ns)
> no-relocs: buffers=1024: old= 154379 + 4.9*reloc, lut= 152796 + 4.3*reloc (ns)
> no-relocs: buffers=2048: old= 309435 + 5.0*reloc, lut= 301095 + 4.9*reloc (ns)

Hmm, either the patch really did make things faster or the results are
being subjected to cpufreq.

relocation: do the full relocation processing
skip-relocs: all the relocation addresses are correct, so no rewrite
no-relocs: no buffers moved, no relocation processing

"Buffers" is the number of buffers passed into execbuffer, and so
we measure the cost of trying to process a certain complexity of a batch.
Then we do a least-squares plot through a number of batches with varying
numbers of relocations to estimate the overhead of processing the
execbuffer array versus the cost of each addition relocation in the
batch. The first number (the x_0 intercept in nanoseconds) should be the
measure of how long it takes to grab all the buffers into our local
structures - this is what I was expecting to be worst hit by the vma
patches.
-Chris
Daniel Vetter Aug. 14, 2013, 9:38 a.m. UTC | #3
Hi Ben,

For my own review benefit I've slashed your patch into pieces and split out 3
prep patches. Overall diff is unchanged with the exception of a now stale FIXME
comment that I've killed. Still need to do the in-depth review of the last
patch, but looks good thus far.

Cheers, Daniel
Ben Widawsky Aug. 14, 2013, 11:22 p.m. UTC | #4
On Wed, Aug 14, 2013 at 11:43:58PM +0100, Chris Wilson wrote:
> These are my numbers for a beefy haswell box (note the really
> interesting numbers will be on Baytrail):
> 
> unpatched:
> 
> relocation: buffers=   1: old=  21945 + 34.4*reloc, lut=  21814 + 34.0*reloc (ns)
> relocation: buffers=   2: old=  15947 + 36.4*reloc, lut=  16169 + 35.4*reloc (ns)
> relocation: buffers=   4: old=  12711 + 37.6*reloc, lut=  13039 + 36.7*reloc (ns)
> relocation: buffers=   8: old=   6154 + 40.9*reloc, lut=   7201 + 38.9*reloc (ns)
> relocation: buffers=  16: old=   4846 + 41.6*reloc, lut=   5337 + 40.6*reloc (ns)
> relocation: buffers=  32: old=   7097 + 41.9*reloc, lut=   6943 + 41.0*reloc (ns)
> relocation: buffers=  64: old=  13318 + 41.9*reloc, lut=  12748 + 41.2*reloc (ns)
> relocation: buffers= 128: old=  27282 + 43.0*reloc, lut=  25778 + 41.7*reloc (ns)
> relocation: buffers= 256: old=  54535 + 45.2*reloc, lut=  51912 + 43.7*reloc (ns)
> relocation: buffers= 512: old= 137447 + 53.2*reloc, lut= 129333 + 45.5*reloc (ns)
> relocation: buffers=1024: old= 307347 + 66.5*reloc, lut= 291487 + 48.1*reloc (ns)
> relocation: buffers=2048: old= 606300 + 92.1*reloc, lut= 574774 + 51.6*reloc (ns)
> skip-relocs: buffers=   1: old=   1583 + 15.6*reloc, lut=   1516 + 14.5*reloc (ns)
> skip-relocs: buffers=   2: old=   1621 + 15.6*reloc, lut=   1603 + 14.5*reloc (ns)
> skip-relocs: buffers=   4: old=   1791 + 15.6*reloc, lut=   1777 + 14.5*reloc (ns)
> skip-relocs: buffers=   8: old=   2009 + 15.6*reloc, lut=   2024 + 14.6*reloc (ns)
> skip-relocs: buffers=  16: old=   2637 + 15.7*reloc, lut=   2564 + 14.6*reloc (ns)
> skip-relocs: buffers=  32: old=   3835 + 15.8*reloc, lut=   3785 + 14.7*reloc (ns)
> skip-relocs: buffers=  64: old=   6996 + 15.8*reloc, lut=   6681 + 14.7*reloc (ns)
> skip-relocs: buffers= 128: old=  14333 + 16.4*reloc, lut=  13560 + 15.2*reloc (ns)
> skip-relocs: buffers= 256: old=  28092 + 17.7*reloc, lut=  26759 + 16.2*reloc (ns)
> skip-relocs: buffers= 512: old=  70885 + 25.2*reloc, lut=  66713 + 17.9*reloc (ns)
> skip-relocs: buffers=1024: old= 158520 + 35.2*reloc, lut= 150828 + 20.1*reloc (ns)
> skip-relocs: buffers=2048: old= 314208 + 54.3*reloc, lut= 298343 + 22.1*reloc (ns)
> no-relocs: buffers=   1: old=   1533 + 5.2*reloc, lut=   1498 + 4.9*reloc (ns)
> no-relocs: buffers=   2: old=   1518 + 5.2*reloc, lut=   1505 + 4.9*reloc (ns)
> no-relocs: buffers=   4: old=   1647 + 5.2*reloc, lut=   1593 + 4.9*reloc (ns)
> no-relocs: buffers=   8: old=   1882 + 5.3*reloc, lut=   1874 + 5.0*reloc (ns)
> no-relocs: buffers=  16: old=   2399 + 5.3*reloc, lut=   2341 + 5.0*reloc (ns)
> no-relocs: buffers=  32: old=   3638 + 5.3*reloc, lut=   3554 + 5.0*reloc (ns)
> no-relocs: buffers=  64: old=   6622 + 5.3*reloc, lut=   6308 + 5.1*reloc (ns)
> no-relocs: buffers= 128: old=  13584 + 5.3*reloc, lut=  12872 + 5.1*reloc (ns)
> no-relocs: buffers= 256: old=  26519 + 5.8*reloc, lut=  25234 + 5.5*reloc (ns)
> no-relocs: buffers= 512: old=  67128 + 5.4*reloc, lut=  63054 + 5.2*reloc (ns)
> no-relocs: buffers=1024: old= 146705 + 5.2*reloc, lut= 139020 + 5.1*reloc (ns)
> no-relocs: buffers=2048: old= 290319 + 5.4*reloc, lut= 274705 + 5.4*reloc (ns)
> 
> vma(execbuffer):
> 
> relocation: buffers=   1: old=  21922 + 34.6*reloc, lut=  21510 + 34.0*reloc (ns)
> relocation: buffers=   2: old=  16851 + 37.4*reloc, lut=  17123 + 35.4*reloc (ns)
> relocation: buffers=   4: old=  13234 + 37.8*reloc, lut=  13436 + 36.9*reloc (ns)
> relocation: buffers=   8: old=   6549 + 40.8*reloc, lut=   6512 + 39.8*reloc (ns)
> relocation: buffers=  16: old=   5012 + 41.8*reloc, lut=   4883 + 41.0*reloc (ns)
> relocation: buffers=  32: old=   8591 + 42.2*reloc, lut=   8377 + 41.1*reloc (ns)
> relocation: buffers=  64: old=  16051 + 42.8*reloc, lut=  15658 + 41.7*reloc (ns)
> relocation: buffers= 128: old=  33397 + 44.5*reloc, lut=  32705 + 43.3*reloc (ns)
> relocation: buffers= 256: old=  68012 + 46.8*reloc, lut=  66904 + 45.5*reloc (ns)
> relocation: buffers= 512: old= 160162 + 56.4*reloc, lut= 155586 + 49.1*reloc (ns)
> relocation: buffers=1024: old= 348728 + 71.8*reloc, lut= 338113 + 55.1*reloc (ns)
> relocation: buffers=2048: old= 699331 + 98.7*reloc, lut= 675969 + 62.2*reloc (ns)
> skip-relocs: buffers=   1: old=   1642 + 16.5*reloc, lut=   1588 + 15.6*reloc (ns)
> skip-relocs: buffers=   2: old=   1676 + 16.4*reloc, lut=   1663 + 15.6*reloc (ns)
> skip-relocs: buffers=   4: old=   1926 + 16.4*reloc, lut=   1891 + 15.6*reloc (ns)
> skip-relocs: buffers=   8: old=   2218 + 16.6*reloc, lut=   2212 + 15.7*reloc (ns)
> skip-relocs: buffers=  16: old=   2933 + 16.6*reloc, lut=   2880 + 15.7*reloc (ns)
> skip-relocs: buffers=  32: old=   4594 + 16.6*reloc, lut=   4523 + 15.8*reloc (ns)
> skip-relocs: buffers=  64: old=   8414 + 16.8*reloc, lut=   8210 + 15.9*reloc (ns)
> skip-relocs: buffers= 128: old=  17429 + 17.9*reloc, lut=  17062 + 16.8*reloc (ns)
> skip-relocs: buffers= 256: old=  34794 + 19.8*reloc, lut=  34144 + 18.4*reloc (ns)
> skip-relocs: buffers= 512: old=  82287 + 27.6*reloc, lut=  80002 + 20.8*reloc (ns)
> skip-relocs: buffers=1024: old= 179851 + 38.0*reloc, lut= 174574 + 23.9*reloc (ns)
> skip-relocs: buffers=2048: old= 361511 + 57.2*reloc, lut= 350132 + 26.8*reloc (ns)
> no-relocs: buffers=   1: old=   1581 + 5.2*reloc, lut=   1579 + 4.9*reloc (ns)
> no-relocs: buffers=   2: old=   1609 + 5.2*reloc, lut=   1572 + 4.9*reloc (ns)
> no-relocs: buffers=   4: old=   1701 + 5.3*reloc, lut=   1685 + 4.9*reloc (ns)
> no-relocs: buffers=   8: old=   2084 + 5.3*reloc, lut=   2033 + 5.0*reloc (ns)
> no-relocs: buffers=  16: old=   2747 + 5.3*reloc, lut=   2686 + 5.0*reloc (ns)
> no-relocs: buffers=  32: old=   4379 + 5.3*reloc, lut=   4285 + 5.0*reloc (ns)
> no-relocs: buffers=  64: old=   8049 + 5.3*reloc, lut=   7850 + 5.1*reloc (ns)
> no-relocs: buffers= 128: old=  16641 + 5.4*reloc, lut=  16301 + 5.2*reloc (ns)
> no-relocs: buffers= 256: old=  33111 + 5.7*reloc, lut=  32539 + 5.5*reloc (ns)
> no-relocs: buffers= 512: old=  79898 + 5.4*reloc, lut=  77517 + 5.2*reloc (ns)
> no-relocs: buffers=1024: old= 172199 + 5.2*reloc, lut= 166907 + 5.1*reloc (ns)
> no-relocs: buffers=2048: old= 345542 + 5.2*reloc, lut= 334300 + 5.3*reloc (ns)
> 
> So there is measurable degradation for the extra indirections, both for
> looking up the execbuffers and for performing the relocations. Though it
> doesn't merit anything more than a footnote in the changelog.
> -Chris
> 

I'm sad I can't reproduce it. I think I amended the commit message
already, I can do more if you want.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index eb87865..4785d8c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -195,9 +195,9 @@  static int obj_rank_by_stolen(void *priv,
 			      struct list_head *A, struct list_head *B)
 {
 	struct drm_i915_gem_object *a =
-		container_of(A, struct drm_i915_gem_object, exec_list);
+		container_of(A, struct drm_i915_gem_object, obj_exec_link);
 	struct drm_i915_gem_object *b =
-		container_of(B, struct drm_i915_gem_object, exec_list);
+		container_of(B, struct drm_i915_gem_object, obj_exec_link);
 
 	return a->stolen->start - b->stolen->start;
 }
@@ -221,7 +221,7 @@  static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->exec_list, &stolen);
+		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		total_gtt_size += i915_gem_obj_ggtt_size(obj);
@@ -231,7 +231,7 @@  static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 		if (obj->stolen == NULL)
 			continue;
 
-		list_add(&obj->exec_list, &stolen);
+		list_add(&obj->obj_exec_link, &stolen);
 
 		total_obj_size += obj->base.size;
 		count++;
@@ -239,11 +239,11 @@  static int i915_gem_stolen_list_info(struct seq_file *m, void *data)
 	list_sort(NULL, &stolen, obj_rank_by_stolen);
 	seq_puts(m, "Stolen:\n");
 	while (!list_empty(&stolen)) {
-		obj = list_first_entry(&stolen, typeof(*obj), exec_list);
+		obj = list_first_entry(&stolen, typeof(*obj), obj_exec_link);
 		seq_puts(m, "   ");
 		describe_obj(m, obj);
 		seq_putc(m, '\n');
-		list_del_init(&obj->exec_list);
+		list_del_init(&obj->obj_exec_link);
 	}
 	mutex_unlock(&dev->struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3fc4324..2147e4d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -563,6 +563,17 @@  struct i915_vma {
 	struct list_head mm_list;
 
 	struct list_head vma_link; /* Link in the object's VMA list */
+
+	/** This vma's place in the batchbuffer or on the eviction list */
+	struct list_head exec_list;
+
+	/**
+	 * Used for performing relocations during execbuffer insertion.
+	 */
+	struct hlist_node exec_node;
+	unsigned long exec_handle;
+	struct drm_i915_gem_exec_object2 *exec_entry;
+
 };
 
 struct i915_ctx_hang_stats {
@@ -1312,8 +1323,8 @@  struct drm_i915_gem_object {
 	struct list_head global_list;
 
 	struct list_head ring_list;
-	/** This object's place in the batchbuffer or on the eviction list */
-	struct list_head exec_list;
+	/** Used in execbuf to temporarily hold a ref */
+	struct list_head obj_exec_link;
 
 	/**
 	 * This is set if the object is on the active lists (has pending
@@ -1399,13 +1410,6 @@  struct drm_i915_gem_object {
 	void *dma_buf_vmapping;
 	int vmapping_count;
 
-	/**
-	 * Used for performing relocations during execbuffer insertion.
-	 */
-	struct hlist_node exec_node;
-	unsigned long exec_handle;
-	struct drm_i915_gem_exec_object2 *exec_entry;
-
 	struct intel_ring_buffer *ring;
 
 	/** Breadcrumb of last rendering to the buffer. */
@@ -1907,6 +1911,9 @@  unsigned long i915_gem_obj_size(struct drm_i915_gem_object *o,
 				struct i915_address_space *vm);
 struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 				     struct i915_address_space *vm);
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm);
 /* Some GGTT VM helpers */
 #define obj_to_ggtt(obj) \
 	(&((struct drm_i915_private *)(obj)->base.dev->dev_private)->gtt.base)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 01cc016..6bd341a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3111,8 +3111,7 @@  i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 	struct i915_vma *vma;
 	int ret;
 
-	if (WARN_ON(!list_empty(&obj->vma_list)))
-		return -EBUSY;
+	BUG_ON(!i915_is_ggtt(vm));
 
 	fence_size = i915_gem_get_gtt_size(dev,
 					   obj->base.size,
@@ -3152,16 +3151,15 @@  i915_gem_object_bind_to_vm(struct drm_i915_gem_object *obj,
 
 	i915_gem_object_pin_pages(obj);
 
-	/* FIXME: For now we only ever use 1 VMA per object */
-	BUG_ON(!i915_is_ggtt(vm));
-	WARN_ON(!list_empty(&obj->vma_list));
-
-	vma = i915_gem_vma_create(obj, vm);
+	vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto err_unpin;
 	}
 
+	/* For now we only ever use 1 vma per object */
+	WARN_ON(!list_is_singular(&obj->vma_list));
+
 search_free:
 	ret = drm_mm_insert_node_in_range_generic(&vm->mm, &vma->node,
 						  size, alignment,
@@ -3981,7 +3979,7 @@  void i915_gem_object_init(struct drm_i915_gem_object *obj,
 {
 	INIT_LIST_HEAD(&obj->global_list);
 	INIT_LIST_HEAD(&obj->ring_list);
-	INIT_LIST_HEAD(&obj->exec_list);
+	INIT_LIST_HEAD(&obj->obj_exec_link);
 	INIT_LIST_HEAD(&obj->vma_list);
 
 	obj->ops = ops;
@@ -4121,6 +4119,7 @@  struct i915_vma *i915_gem_vma_create(struct drm_i915_gem_object *obj,
 
 	INIT_LIST_HEAD(&vma->vma_link);
 	INIT_LIST_HEAD(&vma->mm_list);
+	INIT_LIST_HEAD(&vma->exec_list);
 	vma->vm = vm;
 	vma->obj = obj;
 
@@ -4870,3 +4869,16 @@  struct i915_vma *i915_gem_obj_to_vma(struct drm_i915_gem_object *obj,
 
 	return NULL;
 }
+
+struct i915_vma *
+i915_gem_obj_lookup_or_create_vma(struct drm_i915_gem_object *obj,
+				  struct i915_address_space *vm)
+{
+	struct i915_vma *vma;
+
+	vma = i915_gem_obj_to_vma(obj, vm);
+	if (!vma)
+		vma = i915_gem_vma_create(obj, vm);
+
+	return vma;
+}
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 425939b..8787588 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -37,7 +37,7 @@  mark_free(struct i915_vma *vma, struct list_head *unwind)
 	if (vma->obj->pin_count)
 		return false;
 
-	list_add(&vma->obj->exec_list, unwind);
+	list_add(&vma->exec_list, unwind);
 	return drm_mm_scan_add_block(&vma->node);
 }
 
@@ -49,7 +49,6 @@  i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct list_head eviction_list, unwind_list;
 	struct i915_vma *vma;
-	struct drm_i915_gem_object *obj;
 	int ret = 0;
 
 	trace_i915_gem_evict(dev, min_size, alignment, mappable);
@@ -104,14 +103,13 @@  i915_gem_evict_something(struct drm_device *dev, struct i915_address_space *vm,
 none:
 	/* Nothing found, clean up and bail out! */
 	while (!list_empty(&unwind_list)) {
-		obj = list_first_entry(&unwind_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&unwind_list,
+				       struct i915_vma,
 				       exec_list);
-		vma = i915_gem_obj_to_vma(obj, vm);
 		ret = drm_mm_scan_remove_block(&vma->node);
 		BUG_ON(ret);
 
-		list_del_init(&obj->exec_list);
+		list_del_init(&vma->exec_list);
 	}
 
 	/* We expect the caller to unpin, evict all and try again, or give up.
@@ -125,28 +123,27 @@  found:
 	 * temporary list. */
 	INIT_LIST_HEAD(&eviction_list);
 	while (!list_empty(&unwind_list)) {
-		obj = list_first_entry(&unwind_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&unwind_list,
+				       struct i915_vma,
 				       exec_list);
-		vma = i915_gem_obj_to_vma(obj, vm);
 		if (drm_mm_scan_remove_block(&vma->node)) {
-			list_move(&obj->exec_list, &eviction_list);
-			drm_gem_object_reference(&obj->base);
+			list_move(&vma->exec_list, &eviction_list);
+			drm_gem_object_reference(&vma->obj->base);
 			continue;
 		}
-		list_del_init(&obj->exec_list);
+		list_del_init(&vma->exec_list);
 	}
 
 	/* Unbinding will emit any required flushes */
 	while (!list_empty(&eviction_list)) {
-		obj = list_first_entry(&eviction_list,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&eviction_list,
+				       struct i915_vma,
 				       exec_list);
 		if (ret == 0)
-			ret = i915_vma_unbind(i915_gem_obj_to_vma(obj, vm));
+			ret = i915_vma_unbind(vma);
 
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 7dcf78c..3ca18bc 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -33,24 +33,24 @@ 
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
 
-struct eb_objects {
-	struct list_head objects;
+struct eb_vmas {
+	struct list_head vmas;
 	int and;
 	union {
-		struct drm_i915_gem_object *lut[0];
+		struct i915_vma *lut[0];
 		struct hlist_head buckets[0];
 	};
 };
 
-static struct eb_objects *
-eb_create(struct drm_i915_gem_execbuffer2 *args)
+static struct eb_vmas *
+eb_create(struct drm_i915_gem_execbuffer2 *args, struct i915_address_space *vm)
 {
-	struct eb_objects *eb = NULL;
+	struct eb_vmas *eb = NULL;
 
 	if (args->flags & I915_EXEC_HANDLE_LUT) {
 		int size = args->buffer_count;
-		size *= sizeof(struct drm_i915_gem_object *);
-		size += sizeof(struct eb_objects);
+		size *= sizeof(struct i915_vma *);
+		size += sizeof(struct eb_vmas);
 		eb = kmalloc(size, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	}
 
@@ -61,7 +61,7 @@  eb_create(struct drm_i915_gem_execbuffer2 *args)
 		while (count > 2*size)
 			count >>= 1;
 		eb = kzalloc(count*sizeof(struct hlist_head) +
-			     sizeof(struct eb_objects),
+			     sizeof(struct eb_vmas),
 			     GFP_TEMPORARY);
 		if (eb == NULL)
 			return eb;
@@ -70,64 +70,97 @@  eb_create(struct drm_i915_gem_execbuffer2 *args)
 	} else
 		eb->and = -args->buffer_count;
 
-	INIT_LIST_HEAD(&eb->objects);
+	INIT_LIST_HEAD(&eb->vmas);
 	return eb;
 }
 
 static void
-eb_reset(struct eb_objects *eb)
+eb_reset(struct eb_vmas *eb)
 {
 	if (eb->and >= 0)
 		memset(eb->buckets, 0, (eb->and+1)*sizeof(struct hlist_head));
 }
 
 static int
-eb_lookup_objects(struct eb_objects *eb,
-		  struct drm_i915_gem_exec_object2 *exec,
-		  const struct drm_i915_gem_execbuffer2 *args,
-		  struct drm_file *file)
+eb_lookup_vmas(struct eb_vmas *eb,
+	       struct drm_i915_gem_exec_object2 *exec,
+	       const struct drm_i915_gem_execbuffer2 *args,
+	       struct i915_address_space *vm,
+	       struct drm_file *file)
 {
-	int i;
+	struct drm_i915_gem_object *obj;
+	struct list_head objects;
+	int i, ret = 0;
 
+	INIT_LIST_HEAD(&objects);
 	spin_lock(&file->table_lock);
+	/* Grab a reference to the object and release the lock so we can lookup
+	 * or create the VMA without using GFP_ATOMIC */
 	for (i = 0; i < args->buffer_count; i++) {
-		struct drm_i915_gem_object *obj;
-
 		obj = to_intel_bo(idr_find(&file->object_idr, exec[i].handle));
 		if (obj == NULL) {
 			spin_unlock(&file->table_lock);
 			DRM_DEBUG("Invalid object handle %d at index %d\n",
 				   exec[i].handle, i);
-			return -ENOENT;
+			ret = -ENOENT;
+			goto out;
 		}
 
-		if (!list_empty(&obj->exec_list)) {
+		if (!list_empty(&obj->obj_exec_link)) {
 			spin_unlock(&file->table_lock);
 			DRM_DEBUG("Object %p [handle %d, index %d] appears more than once in object list\n",
 				   obj, exec[i].handle, i);
-			return -EINVAL;
+			ret = -EINVAL;
+			goto out;
 		}
 
 		drm_gem_object_reference(&obj->base);
-		list_add_tail(&obj->exec_list, &eb->objects);
+		list_add_tail(&obj->obj_exec_link, &objects);
+	}
+	spin_unlock(&file->table_lock);
 
-		obj->exec_entry = &exec[i];
+	i = 0;
+	list_for_each_entry(obj, &objects, obj_exec_link) {
+		struct i915_vma *vma;
+
+		vma = i915_gem_obj_lookup_or_create_vma(obj, vm);
+		if (IS_ERR(vma)) {
+			/* XXX: We don't need an error path fro vma because if
+			 * the vma was created just for this execbuf, object
+			 * unreference should kill it off.*/
+			DRM_DEBUG("Failed to lookup VMA\n");
+			ret = PTR_ERR(vma);
+			goto out;
+		}
+
+		list_add_tail(&vma->exec_list, &eb->vmas);
+
+		vma->exec_entry = &exec[i];
 		if (eb->and < 0) {
-			eb->lut[i] = obj;
+			eb->lut[i] = vma;
 		} else {
 			uint32_t handle = args->flags & I915_EXEC_HANDLE_LUT ? i : exec[i].handle;
-			obj->exec_handle = handle;
-			hlist_add_head(&obj->exec_node,
+			vma->exec_handle = handle;
+			hlist_add_head(&vma->exec_node,
 				       &eb->buckets[handle & eb->and]);
 		}
+		++i;
 	}
-	spin_unlock(&file->table_lock);
 
-	return 0;
+
+out:
+	while (!list_empty(&objects)) {
+		obj = list_first_entry(&objects,
+				       struct drm_i915_gem_object,
+				       obj_exec_link);
+		list_del_init(&obj->obj_exec_link);
+		if (ret)
+			drm_gem_object_unreference(&obj->base);
+	}
+	return ret;
 }
 
-static struct drm_i915_gem_object *
-eb_get_object(struct eb_objects *eb, unsigned long handle)
+static struct i915_vma *eb_get_vma(struct eb_vmas *eb, unsigned long handle)
 {
 	if (eb->and < 0) {
 		if (handle >= -eb->and)
@@ -139,27 +172,25 @@  eb_get_object(struct eb_objects *eb, unsigned long handle)
 
 		head = &eb->buckets[handle & eb->and];
 		hlist_for_each(node, head) {
-			struct drm_i915_gem_object *obj;
+			struct i915_vma *vma;
 
-			obj = hlist_entry(node, struct drm_i915_gem_object, exec_node);
-			if (obj->exec_handle == handle)
-				return obj;
+			vma = hlist_entry(node, struct i915_vma, exec_node);
+			if (vma->exec_handle == handle)
+				return vma;
 		}
 		return NULL;
 	}
 }
 
-static void
-eb_destroy(struct eb_objects *eb)
-{
-	while (!list_empty(&eb->objects)) {
-		struct drm_i915_gem_object *obj;
+static void eb_destroy(struct eb_vmas *eb) {
+	while (!list_empty(&eb->vmas)) {
+		struct i915_vma *vma;
 
-		obj = list_first_entry(&eb->objects,
-				       struct drm_i915_gem_object,
+		vma = list_first_entry(&eb->vmas,
+				       struct i915_vma,
 				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 	kfree(eb);
 }
@@ -173,22 +204,24 @@  static inline int use_cpu_reloc(struct drm_i915_gem_object *obj)
 
 static int
 i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
-				   struct eb_objects *eb,
+				   struct eb_vmas *eb,
 				   struct drm_i915_gem_relocation_entry *reloc,
 				   struct i915_address_space *vm)
 {
 	struct drm_device *dev = obj->base.dev;
 	struct drm_gem_object *target_obj;
 	struct drm_i915_gem_object *target_i915_obj;
+	struct i915_vma *target_vma;
 	uint32_t target_offset;
 	int ret = -EINVAL;
 
 	/* we've already hold a reference to all valid objects */
-	target_obj = &eb_get_object(eb, reloc->target_handle)->base;
-	if (unlikely(target_obj == NULL))
+	target_vma = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(target_vma == NULL))
 		return -ENOENT;
+	target_i915_obj = target_vma->obj;
+	target_obj = &target_vma->obj->base;
 
-	target_i915_obj = to_intel_bo(target_obj);
 	target_offset = i915_gem_obj_ggtt_offset(target_i915_obj);
 
 	/* Sandybridge PPGTT errata: We need a global gtt mapping for MI and
@@ -297,14 +330,13 @@  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
-				    struct eb_objects *eb,
-				    struct i915_address_space *vm)
+i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
+				 struct eb_vmas *eb)
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack_reloc[N_RELOC(512)];
 	struct drm_i915_gem_relocation_entry __user *user_relocs;
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	int remain, ret;
 
 	user_relocs = to_user_ptr(entry->relocs_ptr);
@@ -323,8 +355,8 @@  i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
 		do {
 			u64 offset = r->presumed_offset;
 
-			ret = i915_gem_execbuffer_relocate_entry(obj, eb, r,
-								 vm);
+			ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, r,
+								 vma->vm);
 			if (ret)
 				return ret;
 
@@ -345,17 +377,16 @@  i915_gem_execbuffer_relocate_object(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
-					 struct eb_objects *eb,
-					 struct drm_i915_gem_relocation_entry *relocs,
-					 struct i915_address_space *vm)
+i915_gem_execbuffer_relocate_vma_slow(struct i915_vma *vma,
+				      struct eb_vmas *eb,
+				      struct drm_i915_gem_relocation_entry *relocs)
 {
-	const struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	const struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	int i, ret;
 
 	for (i = 0; i < entry->relocation_count; i++) {
-		ret = i915_gem_execbuffer_relocate_entry(obj, eb, &relocs[i],
-							 vm);
+		ret = i915_gem_execbuffer_relocate_entry(vma->obj, eb, &relocs[i],
+							 vma->vm);
 		if (ret)
 			return ret;
 	}
@@ -364,10 +395,10 @@  i915_gem_execbuffer_relocate_object_slow(struct drm_i915_gem_object *obj,
 }
 
 static int
-i915_gem_execbuffer_relocate(struct eb_objects *eb,
+i915_gem_execbuffer_relocate(struct eb_vmas *eb,
 			     struct i915_address_space *vm)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	int ret = 0;
 
 	/* This is the fast path and we cannot handle a pagefault whilst
@@ -378,8 +409,8 @@  i915_gem_execbuffer_relocate(struct eb_objects *eb,
 	 * lockdep complains vehemently.
 	 */
 	pagefault_disable();
-	list_for_each_entry(obj, &eb->objects, exec_list) {
-		ret = i915_gem_execbuffer_relocate_object(obj, eb, vm);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		ret = i915_gem_execbuffer_relocate_vma(vma, eb);
 		if (ret)
 			break;
 	}
@@ -392,31 +423,32 @@  i915_gem_execbuffer_relocate(struct eb_objects *eb,
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
 
 static int
-need_reloc_mappable(struct drm_i915_gem_object *obj)
+need_reloc_mappable(struct i915_vma *vma)
 {
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
-	return entry->relocation_count && !use_cpu_reloc(obj);
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
+	return entry->relocation_count && !use_cpu_reloc(vma->obj) &&
+		i915_is_ggtt(vma->vm);
 }
 
 static int
-i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
-				   struct intel_ring_buffer *ring,
-				   struct i915_address_space *vm,
-				   bool *need_reloc)
+i915_gem_execbuffer_reserve_vma(struct i915_vma *vma,
+				struct intel_ring_buffer *ring,
+				bool *need_reloc)
 {
-	struct drm_i915_private *dev_priv = obj->base.dev->dev_private;
-	struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	bool need_fence, need_mappable;
+	struct drm_i915_gem_object *obj = vma->obj;
 	int ret;
 
 	need_fence =
 		has_fenced_gpu_access &&
 		entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 		obj->tiling_mode != I915_TILING_NONE;
-	need_mappable = need_fence || need_reloc_mappable(obj);
+	need_mappable = need_fence || need_reloc_mappable(vma);
 
-	ret = i915_gem_object_pin(obj, vm, entry->alignment, need_mappable,
+	ret = i915_gem_object_pin(obj, vma->vm, entry->alignment, need_mappable,
 				  false);
 	if (ret)
 		return ret;
@@ -444,8 +476,8 @@  i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 		obj->has_aliasing_ppgtt_mapping = 1;
 	}
 
-	if (entry->offset != i915_gem_obj_offset(obj, vm)) {
-		entry->offset = i915_gem_obj_offset(obj, vm);
+	if (entry->offset != vma->node.start) {
+		entry->offset = vma->node.start;
 		*need_reloc = true;
 	}
 
@@ -462,14 +494,15 @@  i915_gem_execbuffer_reserve_object(struct drm_i915_gem_object *obj,
 }
 
 static void
-i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
+i915_gem_execbuffer_unreserve_vma(struct i915_vma *vma)
 {
 	struct drm_i915_gem_exec_object2 *entry;
+	struct drm_i915_gem_object *obj = vma->obj;
 
-	if (!i915_gem_obj_bound_any(obj))
+	if (!drm_mm_node_allocated(&vma->node))
 		return;
 
-	entry = obj->exec_entry;
+	entry = vma->exec_entry;
 
 	if (entry->flags & __EXEC_OBJECT_HAS_FENCE)
 		i915_gem_object_unpin_fence(obj);
@@ -482,41 +515,40 @@  i915_gem_execbuffer_unreserve_object(struct drm_i915_gem_object *obj)
 
 static int
 i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
-			    struct list_head *objects,
-			    struct i915_address_space *vm,
+			    struct list_head *vmas,
 			    bool *need_relocs)
 {
 	struct drm_i915_gem_object *obj;
-	struct list_head ordered_objects;
+	struct i915_vma *vma;
+	struct list_head ordered_vmas;
 	bool has_fenced_gpu_access = INTEL_INFO(ring->dev)->gen < 4;
 	int retry;
 
-	INIT_LIST_HEAD(&ordered_objects);
-	while (!list_empty(objects)) {
+	INIT_LIST_HEAD(&ordered_vmas);
+	while (!list_empty(vmas)) {
 		struct drm_i915_gem_exec_object2 *entry;
 		bool need_fence, need_mappable;
 
-		obj = list_first_entry(objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		entry = obj->exec_entry;
+		vma = list_first_entry(vmas, struct i915_vma, exec_list);
+		obj = vma->obj;
+		entry = vma->exec_entry;
 
 		need_fence =
 			has_fenced_gpu_access &&
 			entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 			obj->tiling_mode != I915_TILING_NONE;
-		need_mappable = need_fence || need_reloc_mappable(obj);
+		need_mappable = need_fence || need_reloc_mappable(vma);
 
 		if (need_mappable)
-			list_move(&obj->exec_list, &ordered_objects);
+			list_move(&vma->exec_list, &ordered_vmas);
 		else
-			list_move_tail(&obj->exec_list, &ordered_objects);
+			list_move_tail(&vma->exec_list, &ordered_vmas);
 
 		obj->base.pending_read_domains = I915_GEM_GPU_DOMAINS & ~I915_GEM_DOMAIN_COMMAND;
 		obj->base.pending_write_domain = 0;
 		obj->pending_fenced_gpu_access = false;
 	}
-	list_splice(&ordered_objects, objects);
+	list_splice(&ordered_vmas, vmas);
 
 	/* Attempt to pin all of the buffers into the GTT.
 	 * This is done in 3 phases:
@@ -535,47 +567,47 @@  i915_gem_execbuffer_reserve(struct intel_ring_buffer *ring,
 		int ret = 0;
 
 		/* Unbind any ill-fitting objects or pin. */
-		list_for_each_entry(obj, objects, exec_list) {
-			struct drm_i915_gem_exec_object2 *entry = obj->exec_entry;
+		list_for_each_entry(vma, vmas, exec_list) {
+			struct drm_i915_gem_exec_object2 *entry = vma->exec_entry;
 			bool need_fence, need_mappable;
-			u32 obj_offset;
 
-			if (!i915_gem_obj_bound(obj, vm))
+			obj = vma->obj;
+
+			if (!drm_mm_node_allocated(&vma->node))
 				continue;
 
-			obj_offset = i915_gem_obj_offset(obj, vm);
 			need_fence =
 				has_fenced_gpu_access &&
 				entry->flags & EXEC_OBJECT_NEEDS_FENCE &&
 				obj->tiling_mode != I915_TILING_NONE;
-			need_mappable = need_fence || need_reloc_mappable(obj);
+			need_mappable = need_fence || need_reloc_mappable(vma);
 
 			WARN_ON((need_mappable || need_fence) &&
-				!i915_is_ggtt(vm));
+			       !i915_is_ggtt(vma->vm));
 
 			if ((entry->alignment &&
-			     obj_offset & (entry->alignment - 1)) ||
+			     vma->node.start & (entry->alignment - 1)) ||
 			    (need_mappable && !obj->map_and_fenceable))
-				ret = i915_vma_unbind(i915_gem_obj_to_vma(obj, vm));
+				ret = i915_vma_unbind(vma);
 			else
-				ret = i915_gem_execbuffer_reserve_object(obj, ring, vm, need_relocs);
+				ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
 
 		/* Bind fresh objects */
-		list_for_each_entry(obj, objects, exec_list) {
-			if (i915_gem_obj_bound(obj, vm))
+		list_for_each_entry(vma, vmas, exec_list) {
+			if (drm_mm_node_allocated(&vma->node))
 				continue;
 
-			ret = i915_gem_execbuffer_reserve_object(obj, ring, vm, need_relocs);
+			ret = i915_gem_execbuffer_reserve_vma(vma, ring, need_relocs);
 			if (ret)
 				goto err;
 		}
 
 err:		/* Decrement pin count for bound objects */
-		list_for_each_entry(obj, objects, exec_list)
-			i915_gem_execbuffer_unreserve_object(obj);
+		list_for_each_entry(vma, vmas, exec_list)
+			i915_gem_execbuffer_unreserve_vma(vma);
 
 		if (ret != -ENOSPC || retry++)
 			return ret;
@@ -591,24 +623,27 @@  i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 				  struct drm_i915_gem_execbuffer2 *args,
 				  struct drm_file *file,
 				  struct intel_ring_buffer *ring,
-				  struct eb_objects *eb,
-				  struct drm_i915_gem_exec_object2 *exec,
-				  struct i915_address_space *vm)
+				  struct eb_vmas *eb,
+				  struct drm_i915_gem_exec_object2 *exec)
 {
 	struct drm_i915_gem_relocation_entry *reloc;
-	struct drm_i915_gem_object *obj;
+	struct i915_address_space *vm;
+	struct i915_vma *vma;
 	bool need_relocs;
 	int *reloc_offset;
 	int i, total, ret;
 	int count = args->buffer_count;
 
+	if (WARN_ON(list_empty(&eb->vmas)))
+		return 0;
+
+	vm = list_first_entry(&eb->vmas, struct i915_vma, exec_list)->vm;
+
 	/* We may process another execbuffer during the unlock... */
-	while (!list_empty(&eb->objects)) {
-		obj = list_first_entry(&eb->objects,
-				       struct drm_i915_gem_object,
-				       exec_list);
-		list_del_init(&obj->exec_list);
-		drm_gem_object_unreference(&obj->base);
+	while (!list_empty(&eb->vmas)) {
+		vma = list_first_entry(&eb->vmas, struct i915_vma, exec_list);
+		list_del_init(&vma->exec_list);
+		drm_gem_object_unreference(&vma->obj->base);
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -672,20 +707,19 @@  i915_gem_execbuffer_relocate_slow(struct drm_device *dev,
 
 	/* reacquire the objects */
 	eb_reset(eb);
-	ret = eb_lookup_objects(eb, exec, args, file);
+	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
 		goto err;
 
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->objects, vm, &need_relocs);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
 	if (ret)
 		goto err;
 
-	list_for_each_entry(obj, &eb->objects, exec_list) {
-		int offset = obj->exec_entry - exec;
-		ret = i915_gem_execbuffer_relocate_object_slow(obj, eb,
-							       reloc + reloc_offset[offset],
-							       vm);
+	list_for_each_entry(vma, &eb->vmas, exec_list) {
+		int offset = vma->exec_entry - exec;
+		ret = i915_gem_execbuffer_relocate_vma_slow(vma, eb,
+							    reloc + reloc_offset[offset]);
 		if (ret)
 			goto err;
 	}
@@ -704,14 +738,15 @@  err:
 
 static int
 i915_gem_execbuffer_move_to_gpu(struct intel_ring_buffer *ring,
-				struct list_head *objects)
+				struct list_head *vmas)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 	uint32_t flush_domains = 0;
 	bool flush_chipset = false;
 	int ret;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		ret = i915_gem_object_sync(obj, ring);
 		if (ret)
 			return ret;
@@ -786,13 +821,13 @@  validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
 }
 
 static void
-i915_gem_execbuffer_move_to_active(struct list_head *objects,
-				   struct i915_address_space *vm,
+i915_gem_execbuffer_move_to_active(struct list_head *vmas,
 				   struct intel_ring_buffer *ring)
 {
-	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
 
-	list_for_each_entry(obj, objects, exec_list) {
+	list_for_each_entry(vma, vmas, exec_list) {
+		struct drm_i915_gem_object *obj = vma->obj;
 		u32 old_read = obj->base.read_domains;
 		u32 old_write = obj->base.write_domain;
 
@@ -803,7 +838,7 @@  i915_gem_execbuffer_move_to_active(struct list_head *objects,
 		obj->fenced_gpu_access = obj->pending_fenced_gpu_access;
 
 		/* FIXME: This lookup gets fixed later <-- danvet */
-		list_move_tail(&i915_gem_obj_to_vma(obj, vm)->mm_list, &vm->active_list);
+		list_move_tail(&vma->mm_list, &vma->vm->active_list);
 		i915_gem_object_move_to_active(obj, ring);
 		if (obj->base.write_domain) {
 			obj->dirty = 1;
@@ -862,7 +897,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct i915_address_space *vm)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
-	struct eb_objects *eb;
+	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
 	struct drm_clip_rect *cliprects = NULL;
 	struct intel_ring_buffer *ring;
@@ -1002,7 +1037,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
-	eb = eb_create(args);
+	eb = eb_create(args, vm);
 	if (eb == NULL) {
 		mutex_unlock(&dev->struct_mutex);
 		ret = -ENOMEM;
@@ -1010,18 +1045,16 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	}
 
 	/* Look up object handles */
-	ret = eb_lookup_objects(eb, exec, args, file);
+	ret = eb_lookup_vmas(eb, exec, args, vm, file);
 	if (ret)
 		goto err;
 
 	/* take note of the batch buffer before we might reorder the lists */
-	batch_obj = list_entry(eb->objects.prev,
-			       struct drm_i915_gem_object,
-			       exec_list);
+	batch_obj = list_entry(eb->vmas.prev, struct i915_vma, exec_list)->obj;
 
 	/* Move the objects en-masse into the GTT, evicting if necessary. */
 	need_relocs = (args->flags & I915_EXEC_NO_RELOC) == 0;
-	ret = i915_gem_execbuffer_reserve(ring, &eb->objects, vm, &need_relocs);
+	ret = i915_gem_execbuffer_reserve(ring, &eb->vmas, &need_relocs);
 	if (ret)
 		goto err;
 
@@ -1031,7 +1064,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (ret) {
 		if (ret == -EFAULT) {
 			ret = i915_gem_execbuffer_relocate_slow(dev, args, file, ring,
-								eb, exec, vm);
+								eb, exec);
 			BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 		}
 		if (ret)
@@ -1053,7 +1086,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	if (flags & I915_DISPATCH_SECURE && !batch_obj->has_global_gtt_mapping)
 		i915_gem_gtt_bind_object(batch_obj, batch_obj->cache_level);
 
-	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->objects);
+	ret = i915_gem_execbuffer_move_to_gpu(ring, &eb->vmas);
 	if (ret)
 		goto err;
 
@@ -1108,7 +1141,7 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 
 	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
 
-	i915_gem_execbuffer_move_to_active(&eb->objects, vm, ring);
+	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
 
 err: